ept 


‘DECEMBER 1950. 
ond Robert B. Peart ns. 


Sampling Simplified John W. Tobey 
The Effectiveness of Quality Control Charts - tt inlet, 520 


Two-Choice Selection i 


Operations Analysis and the ‘Theory of Games: 


nexign of Expetiments for Most Precise Slope Eat 
Cuthbert 


che Calculation of the Dosage Response Curve” ewan 
Cornfield and Nathan Mantel 


Index to Volume $8 1950 


a PUBLICATIONS RECEIVED 


{ 


| CAL. 
ts 
e. 
to 
ch 
res ye 
to 
5A6 
he- 
put 
| 
BOOK S by Albert John. B. Canning, Andrew 
G. Cornell, James:F Crow, Walter E. Hoadley, J Simon K 
ames CY, usmets, Noo 
Metropolis, Donald W. Paden, Geo: W. Schustek, H, W. Steinha: 


Amerie Statistical J Association 
Ag { 
wie sation. Ite membership is not confined to professional statisticians but includes 
econotniste, business executives, research directors, government officials, uni- 
‘versity professors, and other-persons who are seriously interested in the applics- 
* tion of statistical methods to practical problema, in the development of more 
© useful methods, and ip the improvement. of basic statistical date. Engineers, 
<a mathematicians, biologists, actuaries, sociologists, psychologists, and representa- 
ae tives of many other professions are included in the membership of the Association. 
Student membership 4.00 
Introductory membership the first payment of 
cante under $0 years of age) 4.00 
Members subscription to Biometrics 
Associate memberehip in the Biometrics Section 4.00 


Contributing 


A 


Subscription rate, $8.00 per aurum. Prices for back issues available on request. 


Additional information about the Associstion and meinbership application 
forme may be —- from the Secretary, 1108 16th ae N. he Washington 
6, D.C. ‘ 


The Editors welcome the submission of articles and notes for pogsit le publica- 
tion in the JounnaL. Manuscripts should be sent tothe Editor, Journal of the 
American Statistical Association, 1108 16th Street, N. W., Washington 6, D.C. . 
(1) Send two or more copies of your manuscript (preferably in separate enve- B 
lopes, to avoid danger of logs). (2) Leava one ba!f of the £=%.pageof yourmanu- —_— 
script blank, to be used for instructions to the. printer. (3) Attach to the front 
of your manuscript a fifty word outline or summery. Authors who wish ‘ 
; tions about the preparation of manuscripts and charte or information about 
editorial policiee should address their inquiries. to the Editor. Book reviews 
and books for review ehould be sent to Profesor 207 


Halli, Univereity of Chicago, Chicago 37, Tliinois, 
Ast::eiation, and 75 cents for one year's subscription te an 


| 

“al 

4 
F 
| 

{ 

q 

j 

| 

: 


JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


VotuME 45 DEcEMBER 1950 NuMBER 252 


ARTICLES 
Who Are the Unemployed? . Paitiep M. Hauser anp Rosert B. Peart 479 
Some Sampling Simplifed . . . . . . .dJoHN W. Tuxey 501 


The Control Charts. . 
. Leo A. AROIAN AND Howarp Levene 520 


. IRwINn Bross 


Two-Choice Selection 


. . Leonarp Gititman 541 


Design of Experiment for Most Precise stops Estimation or Linear Extra- 


polation .. .  Curspert Danie. HeerEMA 546 
Small . . J. H. Coune 557 


Correction to “Some New iain of the Application of Maximum Likeli- 


to of the Dosa: esponse Curve” 
_ Jerome CorNFIELD AND NaTHAN ManTEL 569 


Index of a Volume 45, 1950 (Nos. 249, 250, 251, 252) 
Book Reviews, by Author. . . . ....... . 592 


BOOK REVIEWS 
Mann, H. B., Analysis and Design of Experiments . ANDREWG. CuaRK 570 


GEorGE W. ScHUSTEK 571 


Martuer, K., Biometrical Genetics . . . . . . +. James F. Crow 571 


N. Merropouis 573 


Computation of Blenentar Statistics . 
.ALBERT H. BowKER 574 


GEORGE al one Economic Statistics . 
. Donaup W. Papen 575 


Oscan, On the Accuracy Observations . 


SIMON Kuznets 576 


Moore, Cyclical Revivals and Reces- 
sions E. Hoaptey, Jr. 579 


Studies in Income and Wealth, Volume Twelve. . . Joun B. CANNING 
Stricter, GrorGe J., Employment and Compensation in Education . . 


MeriamM, Lewis, Karu T., and Maroney, MILDRED, 
The Cost and Financing of Social Security . . H. W. Srerngaus 


581 


583 


584 
586 


A cumulative Index to Volumes 1-34, 1888-1939, and Annual Indexes there- 
after, may be obtained from the Office of the Secretary of the American Sta- 


tistical Association. 


hie 
an; 
; 
{ 


581 


583 


584 
586 


1950 


OFFICERS 
AMERICAN STATISTICAL ASSOCIATION 


President: 8. 8. Wilks; President-Elect: L. J. Reed; Vice-Presidents: 

H. A. Freeman, D. 8. Brady, P. M. Hauser; Secretary-Treasurer: 

Samuel Weiss; Directors: G. M. Cox, W. E. Deming, C. H. Goulden, 
F. F. Stephan, W. L. Thorp, L. L. Thurstone 


Members of the Council: C. M. Armstrong, W. S. Brush, W. G. Cochran, 

D. R. G. Cowan, M. I. Gershenson, M. H. Hansen, H. L. Jones, T. J. 

Mills, P. R. Rider, David Schneider, J. R. Stockton, E. J. Swan, 
J. W. Tukey, Sylvia Weyl 


EDITORIAL COMMITTEE 


W. G. Cochran, Chairman; W. Allen Wallis, Review Editor; Milton 
Friedman, Meyer A. Girshick, Jacob Marschak, Douglas Scates, 
Mortimer Spiegelman 


Published Quarterly by the AMERICAN STATISTICAL ASSOCIATION 
Publication Office: 450 Ahnaip Street, Menasha, Wisconsin. Editorial Office: 1108 16th Street, N.W. 


Washington 6, D.C. Acceptance for mailing at special rate of for in the Act of 
s 1925, caubodied § in : ph 4, section 538, P. L. & R., authorized March 25, 1936. Entered as 
nd class matter at post Office at Menasha, Wisconsin. 


"| 


— 
: 
| 


JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


/Number 252 DECEMBER 1950 Volume 45 


WHO ARE THE UNEMPLOYED?* 


M. Hauser 
University of Chicagot 
AND 
Rosert B. Pearu 
Bureau of the Census 


Unemployment statistics in the United States on a compre- 
hensive scale are available only from census data or sample 
surveys. Since 1940 these have been developed so that there 
are now available on a current basis, measures of the volume 
of employment, of the characteristics of the unemployed, and 
of the total labor force. 


ESPITE THE RELATIVELY great utilization of the science and art of 
D statistics in the United States, a continuous statistical series of the 
volume of unemployment based on direct measurement can be con- 
structed only from 1940 to date. Moreover, even such a series can be 
made comparable only through ingenious splicing procedures which in- 
volve adjustment of the 1940 bench-mark Population Census data and 
of the several breaks in the Monthly Current Population Survey oc- 
casioned by modified and improved procedures in the development of 
the current series.’ 

To be sure, measurements of unemployment on a national scale were 
undertaken in the United States prior to the 1940 Census and prior to 
the initiation of the Monthly Report on Unemployment (as the survey 
was originally called) by the Works Progress Administration at about 
that time. A national census of unemployment and partial employment 
was taken in November 1937 on direct order of the Congress.? A num- 


* Presented before the American Statistical Association, 109th Annual Meeting, New York City, 
December 29, 1949. ; 

t On loan to the Bureau of the Census July 1949 to March 1950. 

1 Durand, John D., “Development of the Labor Force Concept, 1930-40” in Labor Force Definition 
and Measurement, Hagood and Duchoff, Social Science Research Council Bulletin 56, 1947, pp. 80-90, 
and Hauser, P. M., “The Labor Force and Gainful Workers, Concept, Measurement, and Comparabil- 
ity,” American Journal of Sociology, Volume LIV, No. 4, January 1949, pp. 338-355. 

?U. 8. Bureau of the Census, “Census of Partial Employment, Unemployment and Occupations: 
1937, Final Report on Total and Partial Unemployment,” Vol. IV. 


479 


480 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


ber of the decennial Population Censuses contained some inquiries 
relating to unemployment, directly or indirectly, including the Cen- 
suses of 1880, 1890, 1900, 1910, and 1930. But the results of the 
Censuses of 1880 and 1910 were never published, the 1890 and 1900 
data were incomplete and inadequate, and the publication of the 1930 
unemployment census data precipitated a prolonged and sometimes 
bitter controversy.® 

During most of the depression years of the 1930’s when this nation 
admittedly experienced its severest and most prolonged period of mass 
unemployment, no one knew how many unemployed there were. One 
could, of course, pick a number from among the assortment of available 
unemployment estimates, but one could hardly be satisfied with a 
figure that in essence was a residual “guesstimate”—the difference be- 
tween an estimated number of gainful workers and an estimated vol- 
ume of employment both of which were subject to large and unknown 
error. 

It seems a little incredible today that we managed to stumble 
through most of the 30’s without having had an acceptable measure- 
ment of the extent of the mass unemployment we experienced or the 
characteristics of the unemployed. Fortunately, some progress has been 
made since the 30’s in a statistical as well as economic, social and pcliti- 
cal sense. The manifest need for national unemployment data gave 
impetus to theoretical and empirical research centered around a num- 
ber of experimental surveys as the result of which new concepts and 
methods of measuring unemployment emerged. More specifically, the 
development of the labor force concept together with the great im- 
provement in methods of sampling human populations made possible 
both more reliable and precise decennial census measurements and cur- 
rent monthly measurements of national unemployment. 

The answer to the question—who are the unemployed—is in a real 
sense a function of the source of the information and the method of 
measurement used. Unemployment statistics obtained through a sys- 
tem of employment exchanges, such as are prevalent in Great Britain 
and on the Continent provide a different answer to this question than 
unemployment statistics obtained through a census or sample survey. 
Data obtained as a by-product of the administration of unemployment 
compensation systems or of employment services have many ad- 


3 See, for example, Persons, Charles E., “Census Reports on Unemployment in April 1930,” Annals 
of the American Academy of Political and Social Science, pp. 154: 12-16, March 1931; Van Kleeck, Mary, 
“The Federal Unemployment Census of 1930,” Journal of the American Statistical Association, 26 (suppl.): 
pp. 189-200, March 1931; and Arner, George B. L., “The Census of Unemployment,” Journal of tha 
American Statistical Association, 26: 303-318, September 1931. 


| | | 
| 
: 
| 
He 
¥ 


WHO ARE THE UNEMPLOYED? 481 


vantages especially for administrative purposes over those obtained 
through survey methods, but they have some disadvantages too. It is 
usually difficult to relate such administrative data to a total labor force 
or population base and, therefore, difficult to analyze such data for 
research, policy or administrative purposes against a background of 
population changes or labor force trends. In any case, we as yet have 
no choice in the United States of obtaining complete unemployment 
figures from alternative sources. Both our employment services and our 
unemployment compensation systems cover only part of the labor 
force. The only comprehensive source of unemployment statistics in the 
United States is, therefore, a census or sample survey which fortunately 
simultaneousiy provide us with a picture of the total population and of 
the labor force supply from which the unemployed are drawn. 

Unemployment statistics based on the survey method are in turn a 
function of the specific conceptual framework and particular methods 
of measurement used. In the United States, the labor force is defined 
as the sum of the employed and the unemployed, or in essence, the 
economically active population. Labor force status is determined from 
the activity of a person during a specified week as reported to a census 
enumerator in reply to a series of ordered questions with assigned pri- 
orities. Persons reported as working or as with jobs but not working for 
specified reasons are defined as employed. Those who report that they 
were looking for work or would have been looking for work except for 
specified reasons are defined as unemployed. It is to be observed that, 
in this “labor force” approach, although a special effort is made to re- 
port a person’s activity in an objective and behavioristic basis, that 
both the employed and the unemployed include “inactive” as well as 
“active” groups. Thus, among the employed are included not only those 
working during the specified week but those with jobs but not actually 
working that week for such reasons as vacation, illness, bad weather, 
or temporary (less than 30 day) layoff. Similarly, among the unem- 
ployed, are found not only persons looking for work, but those who 
would have been looking for work except for reasons of temporary ill- 
ness, belief no work was available, or because they were awaiting recall 
to jobs from which they were on indefinite layoff. In the early 40’s also, 
those persons still on the emergency work relief rolls were included with 
the unemployed. 

It is clear that the assignment of some of the “inactive” groups to 
either the employed or the unemployed required decisions that, by 
some standards, would appear arbitrary. However, both the number 
of unemployed and their characteristics would vary to only a limited 


1080 

irieg 

en- 

the 

900 
1930 
mes 
‘ion 
ble 
La 
be- 
ol- 
vn 
le 
n 
e 


482 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


extent with different allocations of these marginal categories. 
Statistics on total unemployment in the United States are then based 
on the survey method and in accordance with the “labor force” con- 
ceptual framework. This approach was first used in the 1940 Census of 
Population and has, with modification of methods of measurement 
rather than concept, been used in the monthly Current Population 
Surveys since that time. With this conceptual background in mind, let 
us proceed to find out, in a statistical sense,—who are the unemployed. 


THE VOLUME AND INCIDENCE OF UNEMPLOYMENT 


The Population Census taken in April 1940 recorded a relatively 
large volume of unemployment for March 1940 of 8.4 million. This high 
level of unemployment, a heritage from the depression 30’s, declined 
rapidly with increased economic activity resulting from national de- 
fense measures and even more rapidly with the pressure on our labor 
supply created by mobilization for war and all-out war production 
(Table 1). It is noteworthy that in 1944, average unemployment 
reached an unprecedented low point of 700,000, a figure far below that 
normally assumed as a minimum for frictional unemployment. This 
remarkably low unemployment figure for a labor force which averaged 
almost 66 million persons, including the armed forces, and 54.6 million 
civilians in 1944 undoubtedly is attributable in large measure to war 
manpower controls and other factors tending to reduce labor turnover 
to far below normal levels. With the end of hostilities in 1945, and the 
demobilization of armed forces which followed, average unemployment 
increased, but still remained at a relatively low figure. In 1946, average 
annual unemployment rose to 2.3 million. With an unprecedented 
peacetime production stimulated by United States aid abroad, it again 
declined to average 2.1 million in both 1947 and 1948. Finally, with the 
economic adjustment in the early part of 1949, unemployment again 
increased during the spring and summer months. Average annual un- 
employment for 1949 will probably be in the neighborhood of 3.4 
million persons. Even this figure, however, the highest recorded for any 
year since 1941, is relatively small as compared with the annual average 
total of 62 million civilian workers. 

The rate of unemployment, by which we mean, of course, the propor- 
tion of those in the civilian labor foree who were unemployed, has 
fluctuated over a correspondingly wide range (Table 2). The unemploy- 
ment rate in March 1940, on an adjusted basis, was 15} per cent, and, 
for the year as a whole was estimated at 143 per cent. Even in 1941, 
during the so-called “defense boom,” it was relatively high—in the 


: 

4 

Fi 

| 


WHO ARE THE UNEMPLOYED? 483 


100 
TABLE 1 
TREND IN UNEMPLOYMENT, BY SEX, 1940-1949 
ased QUARTERLY MONTHS AND ANNUAL AVERAGES* 
con- (Thousands of persons 14 years of age and over) 
8 of 
ent Month and year Male bad Month and year Male 
ation 
1, let March 1940f........ 8,360 | 6,440 | 1,920 || January 1945......... 630 | 340] 290 
April 1940......... 8,230 | 6,220 | 2,010 || April 1945......... 530 | 270| 260 
. 9,150 | 6,270 | 2,880 || July 1945......... 950| 480| 470 
October 1940......... 7,240 | 5,200 | 2,040 || October 1945......... 1,560 | 940| 620 
Annual Average 1940¢| 8,120 | 5,930 | 2,190 || Annual Average 1945.| 1,040 | 620| 420 
vely January 1941......... 7,410 | 5,560 | 1,850 |! January 1946......... 2,300 | 1,770 | 530 
6,380 | 4,690 | 1,690 || April 1946......... 2,330 | 1,870 | 460 
high July  1041......... 6,000 | 4,000 | 2,000 || July  1946......... 2,270 | 1,760} 510 
ned October 1941......... 3,840 | 2,480 | 1,360 || October 1946......... 1,960 | 1,550 | 410 
d Annual Average 1941.| 5,560 | 3,920 | 1,640 || Annual Average 1946.| 2,270 | 1,800 | 470 
e- 
bor January 1942......... 4,320 | 3,020 | 1,300 || January 1947......... 2,400 | 1,950 | 450 
> Apri! 1942 3,050 | 2,040 | 1,010 || April 1947.......... 2,420 | 1,900 | 520 
ion July 1042......... 2,830 | 1,720 | 1,110 || July 1947......... 2,584 | 1,789 | 795 
ent October 1942......... 1,610 | 950| 660 || October 1947......... 1,687 | 1,183 | 504 
hat Annual Average 1942.| 2,660 | 1,720} 940 || Annual Average 1947.| 2,142 | 1,595 | 547 
a 
his January 1943......... 1,480 | 840] 640 || January 1948......... 2,065 | 1,574 | 491 
April 1943......... 1,010} 510| 500 || April 1948......... 2,193 | 1,567 | 626 
‘ | rere 1,390 750 640 || July 1948......... 2,227 | 1,448 779 
on October 1943......... 730 | 400| 380 || October 1948......... 1,642 | 1,088 | 554 
Annual Average 1943.| 1,070 | 570 | 500 || Annual Average 1948. 2,064 | 1,430 | 633 
rar 
er January 1944......... 810| 350-|| January 1949......... 2,664 | 2,011 | 653 
April 1944.........| 630| 360] 270 || April 1949......... 3,016 | 2,205 | 811 
he July  1044......... | 440| 450 || July 1949......... 4,095 | 2,845 | 1,250 
nt October 1944......... 440 | 220| 220 || October 1949......... 3,576 | 2,563 | 1,013 
Annual Average 1944.| 350] 320 || Annual Average 1949.| 3,395 | 2,415 | 980 
8 | * Based on arithmetic mean of 12 monthly estimates. 
+ Revised 1940 Census of Population data relating to week of March 24-30. 
in t Includes an allowance for January and February. 
ie Source: 1940 Census of Population and Current Population Survey, Bureau of the Census. 
n 
. neighborhood of 10 per cent. At the height of the war, in 1944, the rate 
4 of just slightly over 1 per cent clearly points up the virtual disap- 
y pearance of unemployment. In 1947 and 1948, which many economists 
‘ regard as the standard for a full employment economy in peace time, 


the unemployment rate averaged just 33 per cent. The estimated aver- 
age rate for 1949 of 5} per cent, although considerably above other 
post-war years, is still moderate in the light of pre-war experience.‘ 


4 Because the estimates since 1940 are based on a sample, they may differ somewhat from the figures 
that would have been obtained froma complete census using the same schedules, instructions, and enu- 
merators. As an illustration, the average number of persons unemployed during 1949 was estimated at 
3,395,000. The chances are about 19 out of 20 that a complete census would have yielded a figure be- 
tween 3,278,000 and 3,512,000. Similarly, the average unemployment rate during 1949 was estimated at 


_| 


484 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


TABLE 2 


UNEMPLOYMENT RATES,* BY SEX, 1940-1949 
QUARTERLY MONTHS AND ANNUAL AVERAGESt 


(In per cent) 
Both Fe- Both Fe- 
Month and year aan Male part Month and year onan Male pao 
el ee 15.4 | 15.9 | 13.9 || Jamuary 1945........... 1.2 1.0 1.6 
April 15.0 | 15.2 | 14.3 || April 1.0 0.8 1.3 
July 15.8 | 14.5 | 19.4 || July 1.4 2.8 
October 1940............ 13.0 | 12.5 | 14.3 || October 1945........... 2.9 2.7 3.3 
Annual Average 1940§...| 14.6 | 14.3 | 15.5 Annual Average 1945.../ 1.9] 1.8] 2.2 
ee 13.7 | 18.6 | 14.1 || January 1946........... 4.3) 4.8] 3.3 
April ee 11.6 | 11.4 | 12.1 || April ee eee 4.1 4.7 2.8 
July 10.3 | 9.4] 12.9 || July 3.8] 4.1] 2.9 
6.9 | 6.1] 8.9 || October 1946........... 3.3 3.7] 2.4 
Annual Average 1941 9.9] 9.5} 11.2 Annual Average 1946...| 4.4] 2.8 
Jammary 7.9] 7.6] 9.2 || January 1947........... 4.2| 4.7] 2.8 
July 4.1] 6.7 || July 4.0] 4.5 
2.8 | 2.4] 38.8 || October 1947........... 2.8| 2.7] 2.9 
Annual Average 1942....| 4.7 | 4.3] 5.8 Annual Average 1947 83.6] 38.7] 3.2 
2.7 | 2.2| 3.7 || January 1948........... 3.5 | 3.7] 3.0 
April 1.4 2.8 || April 3.6 3.6 3.6 
July 2.4 2.0 3.2 || July 3.5 3.2 4.2 
OotoBer 1.4] 1.1] 2.0 || October 1948........... i 28) 
Annual Average 1943 301 3.7 Annual Average 1948...) 3.4 | 3.3] 3.6 
WAS... 1.5 | 1.3] 2.0 || January 1949........... 4.4) 4.7 3.9 
July 1.6] 1.2] 2.2 || July 6.4] 6.3] 6.7 
October 1044............ 0.8 | 0.6 1.1 || October 1949........... 5.8] 5.4 
Annual Average 1944....) 1.2] 1.0] 1.7 Annual Average 1949...) 5.5 5.5 | 5.4 


* Per cent of those in civilian labor force who were unemployed. 

+t Based on weighted arithmetic mean of 12 monthly estimates. 

t Revised 1940 Census of Population data relating to week of March 24-30. 

§ Includes an allowance for January and February. 

Source: 1940 Census of Population and Current Population Survey, Bureau of the Census. 


Throughout this period there were, of course, important seasonal 
swings in unemployment and considerably larger “gross changes” than 
net changes among the unemployed. A seasonal peak in unemployment 
is generally reached in June and July as large numbers of young persons 
enter the labor market—some to find post-graduation jobs, but most to 
seek temporary summer work. The level of unemployment in these 
months is usually about 12 per cent or more above the average for the 


5.5 per cent. The chances are about 19 out of 20 that a complete census would have yielded a figure be- 
tween 5.3 and 5.7 per cent. Estimates for a single month are subject to greater sampling variability. 
Additional details on the sampling variability of estimates may be obtained by writing directly to the 
Bureau of the Census. 


1950 


WHO ARE THE UNEMPLOYED? 485 


year. “Gross changes” in unemployment are now made available regu- 
larly by the Bureau of the Census. These data, derived from special 
tabulations of identical persons interviewed in successive months in the 
current surveys, show the pattern of gross movements in the labor 
force, that is, the number who entered or left the labor force from one 
month to the next, the number whose status changed from employed 
to unemployed, or in the opposite direction, and so forth.® 


CHARACTERISTICS OF THE UNEMPLOYED 


The volume of unemployment, especially as related to the size of the 
labor force, has proven to be one of the most useful and sensitive of our 
economic indicators. However, an analysis of the nature and origin of 
unemployment requires an examination of the personal and economic 
characteristics of those in the market for jobs. It is perhaps from this 
standpoint that the population approach used in the census and in the 
monthly sample surveys is most fruitful. Not only are data regularly 
collected and tabulated on the characteristics of jobless persons, but, 
with similar information available for all workers, it is possible to ob- 
serve the incidence of unemployment in the various population and 
economic groups. 

For purposes of public policy, as well as for labor market analysis, 
information is needed also on the geographic distribution of the labor 
force and the unemployed. Such information is, of course, available 
from the census in the greatest detail. Unfortunately, however, and 
largely because of budgetary limitations, the current survey sample is 
constructed in such a manner as to provide national estimates only. It 
has been the policy of the Census Bureau to recommend that these 
surveys be expanded periodically to provide at least a minimum of 
geographic detail, particularly in periods of rising unemployment. How- 
ever, on essentially only one occasion since 1940, have appropriations 
been made available for this purpose,* and, with the 1950 census almost 
at hand, there is undoubtedly less justification for such a program in the 
immediate future. Thus, on a regular current basis, the only source of 
information on the incidence of unemployment geographically are the 

5 See U. S. Bureau of the Census, “Shifts in the Labor Force, July-November, 1945,” Labor Force 


Memorandum No. 2, June 1947, for a summary of an earlier study in this field. Data for 1948 can be 
found in Current Population Report Series P-50, No. 16, October 1949, and for 1949 in monthly reports 
in the P-59 series. 

* In April 1947, the sample was expanded momentarily to provide labor force estimates for four 
broad geographic regions (see U. S. Bureau of the Census, “Regional Distribution of the Civilian Labor 
Force of the United States: April 1947,” Current Population Reports Series P-50, No. 6. June 1949). 
Also a, number of local areas surveys in the larger cities and metropolitan areas were conducted in Oc- 
tober and November, 1946, and in April, 1947 (see Current Population Reports in the P-51 and P-LF 
series released periodically during 1947). 


Fe- 
male 
1.6 
1.3 
2.8 
3.3 
2.2 
3.3 
2.8 
2.9 
2.4 
2.8 
.8 
5 
9 
2 
0 
6 

2 

1 
6 
j 


486 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


figures derived from the records of the unemployment insurance sys- 
tem. 

In general, an examination of the characteristics of the unemployed 
reveals some striking similarities in distributions in periods of widely 
divergent levels of business activity. As already noted, the volume and 
rate of unemployment have fluctuated over a wide range during the 
past decade. However, especially if the war period is excluded, it has 
been essentially the same population groups and the same economic 
groups that repeatedly show the highest incidence of unemployment. 


SEX AND AGE OF THE UNEMPLOYED 


In the sex distribution of the unemployed is mirrored many of the 
significant labor market developments of the past decade. At the time 
of the 1940 census, men outnumbered women by a ratio of more than 
3 to 1 in the ranks of the jobless. However, with the entry of women 
into the labor force in unprecedented numbers during the war and with 
the inroads on male workers of the Selective Service system, the unem- 
ployed group in the war period was composed almost equally of men 
and women. In the early reconversion period, with demobilization, the 
sex ratio was quickly restored to the 1940 pattern, but as veterans again 
found their place in the civilian economy, it was eventually reduced to 
a ratio of approximately 23 to 1 in the past 2 years. 

Only during the war years also, were material changes observed in 
the age distribution of the unemployed (Table 3). In the post-war years 
if allowance is again made for the first impact of demobilization, the 
distribution has been essentially the same as at the time of the 1940 
census. During the war, it was at the age extremes that the distribution 
was altered most. In April 1940 (and again in the post-war years), ap- 
proximately 1 in every 6 jobless workers was in his teens (14 to 19 years 
old). The greatly increased labor force participation of youth (most 
striking perhaps among young girls) raised this proportion to about 1 in 
4 during the war period. For comparability with the census, these fig- 
ures relate to April. At the seasonal peak in July, youngsters consti- 
tuted about half of the jobless total at the height of the war. At the 
other end of the age scale, the proportion of persons 65 years old and 
over among the unemployed—although relatively small throughout the 
period—increased appreciably from under 3 per cent in 1940 to 7 or 8 
per cent at the war peak, before receding gradually toward the pre-war 
level. 

UNEMPLOYMENT RATES FOR POPULATION GROUPS 


Again aside from the war years, there have been no striking differ- 
ences over the past decade between the unemployment rates of men and 


~ 
— 
| 
4 
4 


WHO ARE THE UNEMPLOYED? 


bec 
a 


Both 
sexes 


Fe- 
male 


Male 


Both 
male | sexes 
8.4 | 24.0 | 21.2 


Fe- 


5.9 | 37.8 | 29.4 


13.5 


19.4 


4.5 


10 


1 
< 
< 
=) 
Q 
Zz 
< 


15.1 


100.0 


Q 
a 
: 
5 
= 
i=) 
: 


* Revised 1940 Census of Population data relating to week of March 24 to 30. 
Source: 1940 Census of Population and Current Population Survey, Bureau of the Census. 


950 487 
at 1 | lele loco oooco oooco 
$322 3223 S323 3333 
e 
Ic 
& 
a onde Nw nw hk 
32 BOSH ANAS 
é 2222 2222 £222 
a3 RERR KRRAS 
oa 
sea° Sans seseq aeae 
28 COMM MONS HOSS LORS 
sss ssss 388s 3888 $888 
a 26 3 a 3 a 


488 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1950 


women. In March 1940 (as shown in Table 2) the unemployment rate 
for men was slightly higher (about 16 per cent as compared with 14 per 
cent for women). As young men in the age groups with the highest 
incidence of unemployment entered the armed forces, however, the 
female rate gradually gained ascendancy. At the peak of the war effort, 
the unemployment rate for women was, on the average, double that for 
men, largely reflecting the fact that new entrants were relatively 
numerous in the female group at that time. With the deluge of veterans 
into the labor market after V-J Day, this situation was reversed, albeit 
for only a temporary period. In the peak years of 1947 and 1948 and 
again in 1949, the male and female rates have been essentially the 
same. 

A well-known phenomenon is that a charting of unemployment rates 
by age will generally assume the shape of a U-curve, blunted at the 
right extreme. For women, however, there is less tendency for the 
curve to move upward past middle age, although, in some years, it has 
followed the general pattern. 

In Table 4, the unemployment rates by age and sex for 1940, 1944, 
1948, and 1949 are presented. If allowance is made for differences in 
level, it appears from the data that a fairly similar relationship existed 
in the four years in spite of radically different labor market conditions. 
The general shape of the curve hardly requires explanation. The higher 
rate for the teen-age group, where new entrants are predominant, is 
entirely logical, and the sharp downturn as workers approach their 
more productive years is also a quite natural development. The slight 
upturn past middle age may be regarded as a reflection of still existent 
age discriminations in employment. It is at the right extreme that a 
controversial element enters the picture. In 1940, the rate for workers 
65 years old and over dipped downward, breaking the continuity of the 
curve, in 1944 it followed the upward pattern, in 1948, it levelled off.’ 
According to some theories, the 1944 pattern is the correct one and 
deviations from it in other years are indicative of the failure of the labor 
force approach, particularly in depression years, properly to measure 
the incidence of unemployment among older workers. In defense of 
current measurements, it may be said that many older workers want 
occasional work only, and because they tend to schedule their entry 
into the labor force to coincide with the ready availability of jobs may 
never experience a period of unemployment, even in a conceptual sense. 


1 Because these figures for 65 year-old workers for the years 1944 and 1948 are subject to appreci- 
able sampling variability, part, if not most, of the apparent difference in the trend of the curves could 
be due to this factor. 


; 
; 
| 
fe 
ee 
| 


WHO ARE THE UNEMPLOYED? 489 


The existence of a large marginal group of this type may be expected 
to reduce the unemployment rate of older workers in general. In any 
event, it is clear that studies of income and savings, among other things, 
are needed to supplement labor force data for a proper appraisal of the 
economic status of older persons. 


TABLE 4 


UNEMPLOYMENT RATES BY AGE AND SEX; ANNUAL AVERAGES 
1940, 1944, 1948, AND 1949* 


(In per cent) 

Age and sex 1940 1944 1948 1949 
14.6 1.2 3.4 5.5 
14 to 19 years... 31.4 3.2 7.9 11.6 
18.8 2.2 5.5 8.7 
11.6 1.0 2.8 4.9 
12.9 0.7 2.3 3.8 
14.2 0.8 2.8 4.7 
65 years and over........... 8.8 1.0 2.8 4.6 
14.3 1.0 3.3 5.5 
32.8 3.0 8.3 11.9 
11.3 0.7 3.3 3.8 
ee 12.4 0.5 2.3 3.9 
14.9 0.8 2.8 4.9 
65 years and over........... 10.0 3.2 3.0 4.9 
15.5 4.7 3.6 5.4 
290 to 94 YOarB.....cccccccoes 19.9 2.2 4.2 6.7 
14.6 1.0 2.5 3.5 
10.9 0.8 2.7 3.8 
65 years and over........... 3.3 0.7 1.9 3.4 


* For 1944, 1948, and 1949, derived from weighted arithmetic means of the 12 monthly estimates. 
For 1940, estimated from revised 1940 Census figures and less detailed age figures for remainder of 


Source: 1940 Census of Population and Current Population Survey, Bureau of the Census. 


As a postscript to this general subject, it might be added that the 
pattern of unemployment rates by age had been affected only slightly 
by the upturn in unemployment in 1949. The unemployment rates have 
increased fairly uniformly in most adult age groups, and although teen- 
age workers have shown less of a change, their rate, as before, remains 
by far the highest (Table 4). 


950 
te 
er 
st 
ne 
t, 
or 
y 
8 
it 
d — 
e 

| 

| 

| 


490 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


Because of the limitations of the current survey sample, relatively 
little information is available since 1940 on unemployment by color. 
However, it is fairly clear that the historical pattern of more extensive 
unemployment among nonwhite workers has continued over this pe- 
riod. In March 1940, the unemployment rate of nonwhite workers was 
approximately 18 per cent, as compared with 15 per cent for white 


TABLE 5 


UNEMPLOYMENT RATES BY COLOR AND SEX, MARCH 1940, AND 
QUARTERLY 1944 AND 1947-1949 


(In per cent) 
White Nonwhite 
Month and year 
Both Both 
Male Female Male Female 
15.0 15.5 13.5 18.1 19.3 16.8 
1.4 t t 2.6 t t 
3.3 t t 1.6 t 
 icccrrcetacees 1.4 t t 2.5 t t 
0.7 t t 1.4 t t 
3.9 4.4 2.5 6.3 6.9 5.2 
3.8 4.1 2.8 6.7 7.3 5.6 
3.8 3.7 4.2 6.1 6.2 5.9 
RR ere 2.6 2.6 2.8 3.9 4.2 3.4 
3.4 3.5 2.9 4.7 5.2 3.8 
3.4 3.4 3.4 5.9 6.1 5.6 
kd 3.2 2.9 4.0 6.1 6.2 5.9 
2.5 2.3 2.9 4.2 4.2 4.3 
EET eee 4.3 4.5 3.7 6.0 6.4 5.2 
4.7 4.8 4.3 7.6 73 7.5 
6.2 5.9 6.8 8.7 10.1 6.6 
5.3 5.4 5.1 9.7 11.2 7.4 


* Revised 1940 Census of Population data relating to week of March 24 to 30. 
Tt Not available. 
Source: 1940 Census of Population and Current Population Survey, Bureau of the Census. 


workers (Table 5). During the war and in the post-war years, this 
relationship has not only persisted but the rates have shown some 
evidence of further divergence. In the past few years, the nonwhite 
unemployment rates have, on the average, been half again as large as 
those for white workers. It may be significant, however, that despite 
the higher levels of unemployment obtaining in 1949, the position of 
nonwhite workers, as revealed by comparative unemployment rates, 
has shown no signs of worsening. 


é 
| 
at 


ona 


WHO ARE THE UNEMPLOYED? 491 


OCCUPATION AND INDUSTRY OF THE UNEMPLOYED 


Among the economic characteristics of the unemployed that com- 
mand the greatest interest are distributions by occupation and indus- 
try. Unfortunately, because of various technical difficulties, it has not, 
as yet, been possible to revise the occupational and industrial distribu- 
tions prior to July 1945 to take account of improved enumerating pro- 
cedures instituted at that time in the current survey. Therefore, the 
data for recent years are not exactly comparable with those for years 
prior to 1945. However, if the figures are expressed in terms of unem- 
ployment rates—and, from the standpoint of analysis, this type of 
presentation is probably the most useful—the absence of exact com- 
parability is not likely to materially affect the trends and relationships 
inherent in the data. 

The data assembled on occupation show that certain groups, in 1940, 
in the war years, and in the post-war period experienced rates of unem- 
ployment which were consistently above average (Table 6). In large 
measure, these are the occupations which dominate the working force 
in the basic industries in the economy. Nonfarm laborers, the unskilled 
component of the nonagricultural labor force, stand out as a group with 
a uniformly high rate of unemployment. At the time of the 1940 census, 
about one out of every three workers in this group was reported as 
unemployed, approximately 3 times the rate for the entire experienced 
labor force. In subsequent years, the rates for nonfarm laborers, al- 
though reflecting the fluctuations of the economy, have been consist- 
ently 2 to 3 times the average for all groups combined. For a variety of 
reasons, the adverse position of this group is hardly surprising. Their 
lack of specialized skills and the fact that a large proportion are non- 
whites make nonfarm laborers likely candidates for dismissal when re- 
trenchments occur. Also, because of looser job attachments and rela- 
tively low pay scales which stimulate turnover, these workers probably 
contribute disproportionately to frictional unemployment. Finally, the 
fact that these figures relate to April to facilitate comparisons with the 
census shows the group (many of whom are construction workers) at a 
time when its rate of unemployment is seasonally high. In April 1948, 
for example, nonfarm laborers had an unemployment rate of 9 per cent; 
on an annual average basis, the rate was closer to 7} per cent. However, 
even if allowances were made for seasonal influences, the relative posi- 
tion of nonfarm laborers in the unemployment scale would be substan- 
tially unaltered. 

Especially in peace time, the incidence of unemployment has gen- 
erally been well above average also for the large semi-skilled industrial 


| 


*snsueD 94} JO Nveing ‘Aoaing pus jo snsueD 
FQ‘ yetjer Jo Tensn 043 pus 10M Buryzoo] 103 qof 94} TO TONBdNI00 0} 9894} UI pesn snsusD oat 
Le et 8°9 * et * 6T v's Tidy 
~ ve et 8°0 9°T * * 61 [dy 
HLOg 
Z 
a tdnoid 


(gue0 sed uy) 


Ad ‘dNOUD NOILVdNO0O Ad UOUVT AHL AO SALVU LNAWAOTAWANN 


9 


492 


| 
| 
| 
| 
i 
2 
i 
. 
4 


1950 


WHO ARE THE UNEMPLOYED? 493 


group combined under the title of “operatives” and for the more highly 
skilled group of craftsmen and foremen. The various frictions affecting 
the basic industries in the economy and the almost immediate re- 
sponsiveness of these industries to declines in business activity are 
important factors in the high unemployment rate of these two groups 
of workers. During the war, however, with the heavy demand for 
workers with industrial skills, the rates for both operatives and crafts- 
men fell below those for some of the other occupation groups. Service 
workers have also revealed extensive unemployment in most years and, 
during the war, if allowance is made for seasonal factors, shared with 
nonfarm laborers the dubious distinction of first position in the unem- 
ployment scale. In peace time, the high unemployment rate for service 
workers may reflect the less stable job attachments in this field and, 
during the war, the search for better paying jobs in war industries was 
undoubtedly a factor also. 

White collar workers, in general, occupy a median position insofar as 
unemployment rates are concerned. The lowest rates are consistently 
shown for farmers, nonfarm proprietors, and for professional workers. 
Of course, rates of unemployment are not too effective in diagnosing 
the adequacy of employment for these largely self-employed workers. 
In the most depressed periods, with many farms and businesses provid- 
ing less than subsistence earnings, these groups served as a primary 
source of concealed unemployment. Although this is probably not the 
situation today, still analyses of income, proper utilization of skills, and 
similar qualitative approaches are needed particularly for a proper ap- 
praisal of the economic position of self-employed workers. 

The higher rate of unemployment in 1949 has not been reflected 
uniformly in the various occupation groups. Generally, the industrial 
workers—the craftsmen, operatives, and nonfarm laborers—who have 
regularly shown a high incidence of unemployment have also suffered 
greater setbacks than most other occupation groups from the peak 
years of 1947 and 1948. 

Information on the industrial distribution of the unemployed is 
rather limited. The 1940 census figures which provide industrial data 
for all components of the unemployed—emergency workers as well as 
those seeking work—do not show wage or salary workers separately. 
Since, in terms of labor market activity, unemployment is a more 
meaningful concept for wage or salary workers, this combination with 
the self-employed obscures some of the significant industrial relation- 
ships. 

For the post-war years, it is possible to construct separate unemploy- 


494 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


ment rates for wage and salary workers in major industry groups. How- 
ever, because the available groupings are large, only a few generaliza- 
tions can be made from the data. Throughout this period, the construc- 
tion industry shows the highest rate of the major groups, largely an 
indication of the speculative nature of the activity and the greater 
frictions and seasonality in this type of work. In 1948, the rate for con- 


TABLE 7 


UNEMPLOYMENT RATES FOR WAGE OR SALARY WORKERS, BY MAJOR 
INDUSTRY GROUP: ANNUAL AVERAGE, 1948 AND 1949* 


(In per cent) 


Major industry groupt 


Total wage or salary workers............ 
Agriculture, forestry, and fishery 


Nondurable goods 
Transportation, communication, and other public utilities 


Personal service except domestic 
Amusement and recreation 
Professional service 


* Annual averages based on weighted arithmetic mean of 12 monthly estimates. 

+ The industry categories shown are largely major groups in the classification systems used in the 
1940 Census of Population. The specific industry titles included in each major group are given in 
Volume III of the 1940 Census Reports on Population and in the Third Series State Bulletins on Popu- 
lation. 

t The month of October has been excluded in computing the rate for this industry; in October, 
during the work stoppage in the coal mines, a very large proportion of the miners were classified as 
unemployed because they were reported to be seeking substitute work. If this month were included, the 
resultant average rate would greatly exaggerate the typical situation during the rest of the year. 

Source: Current Population Survey, Bureau of the Census. 


struction workers, on an annual average basis, was approximately 
twice that for all wage or salary workers combined (Table 7). Within 
the service group, workers in the amusement and recreation field also 
revealed a high incidence of unemployment, in spite of the advent of 
television and “South Pacific.” In 1949, the rates of unemployment 
have moved upward in most industries; the rates in durable goods 
manufacturing and in mining have shown relatively large changes, 
both having doubled since 1948. 


| 
“ah 
Z 
| 
| 


WHO ARE THE UNEMPLOYED? 495 


These occupation and industry figures, of course, do not take account 
of a fairly important component of the unemployed group, namely 
“new workers.” In Census terminology, a “new worker” is a person 
looking for work who never before held a job lasting two consecutive 
weeks or more. In April 1940, new workers formed 10 per cent of the 
jobless total, and during the war, the proportion advanced to 17 per 
cent in 1943 as inexperienced persons were drawn into the labor market. 
It fell to 8 per cent, however, in the later war years, possibly a reflection 
of the increasing work experience of the population. In the post-war 
years also, it has remained well under the 10 per cent mark. Once 
again, these figures relate to April; at the seasonal peak in July, since 
most new workers are in the younger age group, the proportions in 
many years were as much as two to three times those just cited. 


DURATION OF UNEMPLOYMENT 


In diagnosing the severity of unemployment, second in importance 
only to the number of persons unemployed is information on how long 
they have been out of work. Only relatively crude figures on duration 
of unemployment were developed from the 1940 Census and from the 
sample surveys during the war. Since July 1945, a fairly consistent 
series can be constructed, but because of sampling variability, the 
amount of available detail is necessarily limited. It should be noted 
further that the figures currently available do not relate to final dura- 
tion of unemployment, that is, the complete length of time that unem- 
ployed workers have lost between jobs. Instead, they show the length 
of time that those reported as still unemployed have been continuously 
looking for work up through the survey week. Some experimental work 
is currently in progress to establish, if possible, the relationship of the 
present duration figures to estimates of final duration. Thus far, not 
enough evidence has been accumulated to speak to this point. 

The figures from the 1940 Census, despite their crudity, point up 
quite well the essentially depressed condition of the labor force at that 
time. The average (median) duration of unemployment was in the 
neighborhood of 7 months, and over two-thirds of the jobless had been 
seeking work unsuccessfully for 4 months or longer (Table 8). More- 
over, these figures do not include emergency relief workers whose dura- 
tion of unemployment was undoubtedly longer. During the war, rather 
fragmentary data on duration show that such unemployment as existed 
was almost purely “frictional.” On the average, from two-thirds to 
three-quarters of the unemployed had been looking for work for one 
month or less. 


1950 
OW- 
iza~ 
uc- 
an 
ter 
on- 
he 
in 
ar, 
as 
he 
y 
n 

yf 
t 


496 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


TABLE 8 


PER CENT DISTRIBUTION OF UNEMPLOYED PERSONS, BY DURATION OF 
UNEMPLOYMENT: MARCH 1940, APRIL 1944, AND QUARTERLY 
JULY 1945-OCTOBER 1949 


Duration of unemployment* 
Total 
month month | months | months | months en-euer 

March 1940f.......... oud 100.0 3.6 9.3 7.6 | 11.6 | $67.9 — 
April 1944§....... peaaionl 100.0 41.7 | 39.6 7.4 4.4 5.4 1.5 
July 1945.............5 oi 100.0 35.8 | 35.8 | 16.8 4.2 5.3 2.1 
October 1945............ 100.0 32.1 | 34.6 | 23.1 5.8 3.8 0.6 
January 1946............ 100.0 31.7 | 26.1 | 18.7 | 10.4 | 11.3 1.7 
April 1946...... nieeaniadie 100.0 22.7 | 27.0 | 16.3 | 11.6 | 17.2 5.2 
OO TNS 100.0 31.3 | 21.6 | 12.8 | 10.1 | 16.7 7.5 
October 1946............ 100.0 25.0 | 24.0 | 18.4 7.7 | 14.3 | 10.7 

Annual average 1946]] 100.0 28.2 | 22.9 | 17.9 | 10.6 | 14.2 6.2 
January 1947............ 100.0 27.5 | 24.6 | 15.4 | 12.1 | 12.1 8.3 
OE 100.0 24.8 | 30.2 | 13.6 | 10.3 | 13.2 7.9 
July 1947...... ecksodins 100.0 27.5 | 40.1 | 10.6 5.4 | 10.5 6.0 
October 1947............ 100.0 30.6 | 33.7 | 13.2 7.4 8.5 6.5 

Annual average 1947]| 100.0 24.2 | 33.8 | 14.4 9.0 | 10.9 7.7 
January 1948............ 100.0 26.1 | 39.6 | 13.4 6.2 8.9 6.0 
ee rere 100.0 23.9 | 32.6 | 15.0 9.9 | 13.5 5.0 
intend acini 100.0 29.3 | 41.1 | 11.5 6.2 7.0 5.1 
October 1948............ 100.0 26.6 | 38.3 | 12.5 6.6 8.5 7.7 

Annual average 1948|| 100.0 27.3 | 35.5 | 14.4 7.9 9.4 5.6 
January 1949............ 100.0 23.9 | 39.5 | 17.4 7.5 7.9 3.8 
ee 100.0 17.6 | 30.4 | 18.2 | 13.4 | 15.1 5.3 
a 100.0 20.4 | 36.8 | 15.4 8.8 | 10.7 8.0 
October 1949............. 100.0 21.1 | 35.3 | 12.2 8.4 | 13.2 9.8 

Annual average 1949||... 100.0 20.9 | 33.2 | 16.1 9.7 | 12.6 7.5 


* Duration represents length of time unemployed persons have been reported as continuously look- 
ing for work up through the end of the Census or survey week. Starting in March, 1947, duration has 
been reported in weeks but the data have been converted to months for comparability with earlier 


dates. 


t Unrevised 1940 Census of Population data relating only to experienced workers seeking work who 
reported duration of unemployment; excludes emergency relief workers, new workers, and those not 


reporting duration. 


t Data relates to duration of 4 months or over. Separate data not available for March 1940 for 


duration of 4 to 6 months or over 6 months. 


§ Estimated from incomplete distribution. 


|| Annual average based on weighted arithmetic mean of 12 monthly figures. 


Source: 1940 Census of Population and Current Population Survey, Bureau of the Census. 


The “frictional” element has been highly apparent also in the figures 
for the post-war years. In 1947 and 1948, approximately three-fifths of 
the unemployed, on the average, had been looking for work for one 


month or less. With the moderate decline in business activity in 1949, 


j 
| 
> 
q 
on 
| 
a 
al 
3 
ig 


WHO ARE THE UNEMPLOYED? 497 


this proportion has dropped somewhat but has still remained slightly 
above 50 per cent on the average. 

Sometimes, however, statistics on duration of unemployment can be 
misleading. When unemployment is relatively stable or is declining, a 
large proportion in the low duration groups is generally indicative of a 
high rate of turnover. However, during periods of rising unemploy- 
ment, particularly when the jobless total is small at the outset, it is to 
be expected also that a large proportion will appear in the lower dura- 
tion groups, at least for a time, because so many have only recently be- 
come unemployed. Thus, in gauging the severity of unemployment in 
1949, it may be more pertinent to examine developments among the 
long-term unemployed. In this respect, it is noteworthy that the num- 
ber reported as jobless for 4 months or longer began a steady rise early 
in 1949 which only recently has shown signs of levelling off. In ab- 
solute numbers, there are now 700,000 to 800,000 unemployed workers 
in this group (about 20 per cent of jobless total) as compared with an 
average of only 300,000 to 400,000 in other post-war years. 

Comparisons for men and women show that, both in 1940 and at pres- 
ent, men generally have a higher average duration of unemployment. 
In the post-war period, the duration of unemployment for men, on 
the average, has been a quarter to a third above that for women. Some 
tabulations have recently become available cross-classifying duration 
of unemployment by age, which, although requiring caution in inter- 
pretation because of sampling variation, may serve to throw some addi- 
tional light on this important subject. These tabulations, based on 
quarterly months in 1948 and 1949, reveal a quite evident positive cor- 
relation between duration and age. Generally, the teen-age unemployed 
appear to have the lowest average duration of unemployment, but there 
are no significant differences up to age forty-five. After that point, the 
rate of increase seems to accelerate, until, for those 65 years old and 
over, the average is approximately double that for the teen-age group. 

Much of the unemployment among young persons is of a frictional 
nature, arising from frequent job changes in the course of their orienta- 
tion in the labor force. Among older workers, however, there is undoubt- 
edly a large element of hard-core unemployment, traceable to discrim- 
inations in hiring and to the fact that many have obsolescent skills. 


“UNEMPLOYMENT” IN RELATION TO “FULL EMPLOYMENT” 


With the passage of the Employment Act of 1946, under the new 
functions and responsibilities undertaken by the Federal Government, 
unemployment statistics have assumed a special significance. In the 


‘ - 
1 
— 
— 
is 
r 
: 


498 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


popular mind, all too frequently, there is a tendency to interpret meas- 
ures of unemployment as the measure of the extent to which our econ- 
omy falls short of full employment in the sense of the Act. There are 
several fundamental reasons that invalidate such interpretation. In the 
first place, unemployment in the sense of labor market activity, is a 
concept which has less meaning for the self-employed segment of the 
labor force than those who are wage or salaried workers, that is, de- 
pendent upon a job. A less than full employment economy involves 
more than a loss of jobs to wage and salaried workers. It includes de- 
creased sales and income to large and small entrepreneurs who, unless 
they cease operations completely, are “never” unemployed by defini- 
tion. It includes a decline on return of investment to large and small 
investors, many of whom may never be active participants in the labor 
market. Moreover, a low level of economic activity usually precipitates 
a reduction in hours of work below the desires and needs of many wage 
or salaried workers. It may also increase the number of persons on tem- 
porary layoff or on layoff euphemistically labeled as vacation, who are 
classified as employed. Finally, account must be taken of persons in 
marginal groups or with marginal skills who may leave the labor force 
because of discouragement or never enter the labor market at all even 
though they want or need work. A full measure of the effects of a less 
than full employment economy involves, therefore, the measurement of 
a number of things other than unemployment as defined by the Census. 

The figures on total unemployment and the rate of unemployment 
provided by the Census Bureau should be regarded primarily as an 
indicator of economic change. In this sense, they are particularly useful 
as a danger signal and in pointing up the need for a more critical analy- 
sis of the functioning of the economy. The differentiation in the labor 
force statistics on the composition of the unemployed, the duration of 
unemployment, and the vast assemblage of data on the employed will 
serve well in any such analysis. Moreover, through periodic supple- 
mentary questioning, information is provided on the extent of involun- 
tary part-time employment, the existence of an unmeasured “fringe” 
group of workers and related subjects. Unfortunately, many consumers 
of these statistics focus only on the total volume of unemployment re- 
ported instead of following changes in the components of the labor force 
which may also be significant. This may, in part, be attributable to the 
fact that the press generally features the total unemployment figures be- 
cause of their greater newsworthiness and either fails to report or ob- 
scures other changes in the labor force. 

Perhaps, for some purposes, it might be useful to develop a single 


4 

a 

7 
2 

bors, hy 

4 
test 

: 


WHO ARE THE UNEMPLOYED? 499 


summary measurement which seeks to take account of these various 
elements. However, whatever its nature, such a measurement could 
not be regarded as indicative of the failure of the economy to provide 
full employment as seen in the Act. It could not possibly incorporate 
the element of “adequacy” of employment which is a function of in- 
come received, proper utilization of skills, and various other qualita- 
tive factors. In fact, it may be, that from the standpoint of analysis, 
what is needed is greater differentiation rather than more summariza- 
tion. In this light, it would be the function of the agencies responsible 
for public policy in the economic field—and, in the main, they are do- 
ing this—to assemble the various pieces and provide the administra- 
tion, the Congress, and the public with a rounded and complete picture 
of the economic situation. 

Another caution that must be exercised in interpreting either total 
unemployment figures or various differentiations of the unemployed 
or part-time employed, and other categories referred to, as measures 
of a less than full employment economy centers around the relation of 
the volume and composition of the unemployed at any time to the size 
and composition of the total labor force. The monthly statistics of the 
labor force and its components have taught us much about dynamics 
of the labor supply and its utilization that was virtually unknown prior 
to 1940. It was possible with great accuracy to measure the expansion 
in the total labor force of the nation including the Armed Forces during 
the war; its temporary contraction following the cessation of hostili- 
ties; and the resumption of expansion under conditions of unprece- 
dented peacetime production in 1947 and 1948. As a result, we know 
that the labor force at the height of the war contained some 8 million 
more workers than could have been anticipated on the basis of the pre- 
war secular trend. We know further that this so-called “excess over 
normal” declined to 1.1 million persons in April 1947; rose to 1.6 million 
persons in April 1948; and, was still at a level of 1.3 million persons in 
1949. We know also that in April of this year the total labor force still 
contained an “excess over normal” of 1.5 million boys and girls 14 to 19 
years of age, many of whom would normally be in school and 600,000 
older workers, men and women, many of whom would normally be re- 
tired. We know further that in the same month there was a deficit of 
1.7 million women 20 to 44 years of age in the labor force primarily be- 
cause of the extraordinarily high marriage and birth rate during the 
war and early post-war years, and a deficit of .4 million males 20 to 34 
years of age in the labor force primarily because of schooling being re- 
ceived by veterans under the GI Bill of Rights. 


1950 
on- 
are 
the 
3a 
he 
le- 
eS 
le- 
SS 
ull 
or 
eS 
re 
l- 

e 

n 

e 

n 

f 


500 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


These facts have important implications for government and busi- 
ness policy designed to cope with the problems of maintaining full em- 
ployment. Apparently, we as a nation are confronted with certain basic 
decisions. Either, we have to be prepared to provide useful employ- 
ment opportunities to all those “able, willing, and seeking to work,” 
including those who, by pre-war standards, might be regarded as sur- 
plus to the labor supply, or, we have to offer special inducements to 
persons at opposite ends of the age scale, on the one hand, to postpone 
their entry into the labor force and, on the other, to retire from the 
labor force at an earlier age. 

The measurement of the volume of unemployment and the charac- 
teristics of the unemployed which have been available on a current 
basis since 1940 are among the more important statistical develop- 
ments of the decade. Despite some obvious limitations of the data, de- 
spite some disagreement with the allocation of some of the marginal 
components of the labor force to the employed or unemployed cate- 
gories, the fact is, that we have developed and we have at our disposal 
through the Bureau of the Census, a generally accepted, reliable, and 
valid measurement of the total labor force, and of total employment 
and unemployment on a current basis. The monthly series available 
since 1940 has proved itself in its ability to measure the dramatic 
changes in the composition and size of the labor force brought about by 
the exigencies of war and post-war adjustment. The series actually 
reflects both increases and decreases not only in unemployment, but in 
the size and composition of the total labor force and its varied compo- 
nents. There is undoubtedly room for improvement particularly in the 
methods of measurement used in the labor force censuses and surveys. 
But we can look to the future with reasonable confidence that we need 
not again face the uncertainties, confusion and chaos of the 30’s where 
no one could answer the question—How Many Unemployed Are There? 
—let alone the question—Who Are the Unemployed? 


j 
| 
; 
4 
ise 
4 
‘ 


SOME SAMPLING SIMPLIFIED 


Joun W. TuxKEy* 
Princeton University 


Results in the theory of sampling from finite populations 
can be obtained very easily by working in terms of k’s. The 
single subscript k’s were introduced by R. A. Fisher [2], while 
the multiple subscript ones seem to be new. The k’s have very 
simple general properties as regards sampling from one or 
more finite populations, and may be easily computed nu- 
merically. Applications are made here mostly to known re- 
sults, while further applications to obtain new results will be 
made elsewhere. 


INTRODUCTION 


HE PURPOSE of the paper is to exhibit simple methods for dealing 

with sampling from finite populations. It has always been argued 
that infinite populations were simpler, and that finite populations in- 
troduced complications. The writer holds the contrary opinion—he 
believes that finite populations will prove simpler (and more powerful). 
The derivations given here illustrate this. 

The central tools are the expressions whose average in samples of 
any size is the same as their values for the whole finite population. It 
turns out to be easy to find all symmetric expressions with this property 
of being inherited on the average. Among these expressions certain 
ones are singled out by their simple behavior when the observations are 
sums of randomly corresponding elements in samples drawn from two 
populations. These are the k’s, which can be calculated numerically, as 
is shown, with little more effort than the calculation of moments about 
the mean. 

The variances and covariances of the sample variance and of the 
sample mean (when sampling from a finite population) can be found 
very easily as we show in a later section. With one added device, these 
methods give very simple and rapid methods for calculating sampling 
variances of variance components and the like. This application will 
be reported elsewhere, but deserves mention here because of its utility 
and because of the need for analysis-of-variance theory admitting one 
or more finite populations. In practice, the populations of plots, 
chemists, or times of day are finite! 

The k’s with one subscript were introduced by R. A. Fisher in 1930 


* Fellow of the John Simon Guggenheim Foundation. 
501 


1950 
usi- 
ASIC 
wy 
k, ” 
ur- 
to 
ne 
the 
nt 
le- 
al 
e- 
a] 
1d 
at 
le 
ic 
y 
y 
n 
e 
? 


502 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


[2], in the most important step so far taken in connection with sampling 
moments. He was interested in their inheritance on the average for 
samples drawn from infinite populations. He showed that the sampling 
behavior of the k’s was much simpler than that of the sample moments, 
and showed how to apply combinatorial ideas to find the sampling 
cumulants of the k’s. 

Fisher has had a proof [1, page 218] for many years that the one- 
subscript k’s were inherited on the average from finite populations. 
However, the first substantial use of this fact was made by Irwin and 
Kendall [3], who applied the principle to the evaluation of variances 
and covariances of sample k’s. 

The two and more subscript k’s are believed to be new. 


INHERITANCE ON THE AVERAGE 


We are interested in samples of some fixed size, say n, drawn from 
a finite population. Let us write for the N elements 
in the population and 2%, x2, - - - , Za for the n elements in the sample. 
There are (¥) kinds of samples, and they are equally probable by 
definition. We shall write “ave” for the operation of averaging over all 
these kinds of samples, and reserve the overscore, as usual, to indicate 
means over the sample. 

Suppose, for example, that N=4, n=3 and that the elements of the 
population have the values 1, 3, 7 and 25. There are ($)=4 kinds of 
samples as follows: 


KIND OF SAMPLE SAMPLE MEAN 


1+3+4+7 


1, 3,7 = 34, 
143425 

147425 

1, 7, 25 

3 

347425 


3 
It is well known that the average of the sample means, 


= 


ave {z} = 


vi 
| 
> 
e 
ane 
he 
| 
a 
4A 
pee 
: 


ling 
ling 
nts, 
ling 


ne- 
ns, 
und 
ces 


om 
its 
le. 
by 
all 
ite 


he 
of 


SAMPLING SIMPLIFIED 
is always the same as the mean of the population 


= 


We shall express this by saying that the mean is inherited on the 
average. In general, an expression will be said to be inherited on the 
average if, whatever be n and N, the average over all samples of size 
n is equal to its value for the population of size N from which the 
samples were drawn. We seek now for other expressions that are in- 
herited on the average. 

The simplest of these are to be found by changing z into 2’, z*,---, 
and so on. If we think of new populations made up of the squares, 
cubes, - - - , and so on of the original values, we see that this is really 
a special case of the result for the mean. To express this fact in for- 
mulas, we write 


for the averages of x? over the sample and the population respectively. 
Then we have 


ave {(p)} = (p)’. 


Let us consider how to prove this simply. 

Since (p) is an average of x? over the sample, the result of averaging 
further (over all samples) must be a weighted average of the values 
of x? in the population. Since each element of the population appears 
symmetrically in the samples, the weights must be equal. Since we 
have an average, the weights must add to unity. Hence, the weights 
are all 1/N, and the average of the sample means is just the simple 
average of all the values of x? for x in the population; this is (p)’. In 
terms of our example, this argument appears as follows: 


(p) = + 3” + 7) + 3” + 25%) 
+4(1? + 7° + 25°) + + 7° + 25°)] 
= + + 3(7*) + 3(25°)] 
= 3[17 + 37 + 7? + 257] 
= (p)’. 


1 
(p) = ae 
1 n 
= Ta)? 


504 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


In general, each x? has weight 1/n in the é- |) kinds of samples in 


which it occurs and zero elsewhere. And 
n\n-1 n 
Now let us try to extend this argument. We seek an average over 
the sample which, when averaged further over all samples, gives an 
average over the population of the same form with which we started. 
Each pair 2;, x; of different elements (different in subscripts, but 
not necessarily in value) appears in the same number, (4-3), of 
samples. Thus 
Lt x2; 


(11) 


where peg indicates summation over unequal indices, is inherited on 
the average, for we can copy the argument we used for the sample 
mean, changing only “elements” to “pairs of distinct elements”. 

More generally, we define 


and see that all these expressions, which we shall call merely brackets, 
are inherited on the average. 

A result which we do not need explicitly, and whose proof we leave 
to the reader, is this: Every expression which is 


(i) a polynomial 
(ii) symmetric 
(iii) inherited on the average 


can be written as a linear combination of brackets with coefficients which 
do not depend on the size of the set of numbers concerned. Not only are 
the brackets inherited on the average, but there are enough brackets 
te easily describe any polynomial symmetric expression that is in- 
herited on the average. 


RANDOMIZED SUMS 
Let us consider two sets of n values, {z;} and {ya}, and the result 


i 
+ 
> 
(pqr 3 
x 
a 
Fe 
oe 


1950 


in 


ver 
an 
ed. 
but 
of 


ts, 


ve 


SAMPLING SIMPLIFIED 505 


of adding these together to form n sums {2;+-ya} in all the n! possible 
ways. A simple example for n=3 is obtained by taking the 2’s to be 1, 
3 and 8 and the y’s to be 20, 40 and 60. There are then 3!=6 sets of 


sums; namely 


21, 43, 68; 23, 48, 61; 
21, 48, 63; 28, 41, 63; 
23, 41, 68; 28, 43, 61. 


Such sets would arise, for example, when three varieties of corn, of 
natural excess fertility 1, 3 and 8, were assigned at random to three 
fields of standard fertility 20, 40 and 60. Or when measured values are 
the sum of true values and unknown but definite functions of the 
weather, and the dates of test are assigned at random. And so on. The 
more realistic case where the n z’s are a sample from a finite popula- 
tion and the n y’s are a sample from another finite population will be 
discussed in the next section. 

We are interested in expressions whose value for these sums, when 
averaged over the n! sets, is expressed simply in terms of the individual 
sets. The simplest expression of this sort is just the sum of the values, 


since 
+ yo) = + Liye. 
without benefit of randomization. We can find others, however, for 
which rendomization will lead to an equally simple result. For example, 
consider the expression 
= (x; 24)? 


where the sum is taken over 7 and j distinct (although in this example 
we need not make this restriction). For one particular set of sums this 
becomes 


Je = + ye — — yr)? 
= 2)? + — 2) (ye — yw) + y)*. 
Let us consider the effect of averaging over the n/ sets of sums, which 
we denote by “aver”. The first and third terms on the right are un- 


affected but what of the middle term? For every set of sums in which 
there appears 


uty. and 74+ 4 
there will be one in which there appears 
tity and 2+ 


| 


506 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


But the contributions to the middle term from these two pairs are equal 
and opposite. Thus, on the average, the middle term cancels out to 
zero, and we have 


aver {Je} = aver { >. * (2: + ys — 2; — ys)?} 
= DF 2)* + ye 
= + J.™, 
where * and ** refer to the first and second set of nm numbers respec- 
tively. 
If we consider the expression 
J3 = > (22; — 2; — %)* 
and write it out for the sums, we find 
Js = + — — yo — — ye)? 
= 22; — 2; — + (20; — 2; — 21)*(2ya — ys — Ye) 
* (20; — — — ys — ye)? + (2ye — ys — 
= (22; — 2) — + — yo — 
+ 3 (22; — — Xx)*(Ya — Yo) 
* (22; 2; — — yc) 
+3 (a; — 2))(2ya — ys — ye)? 
+3 (2; — — yo — 


Now for each of the last four terms vanishes (individually) on the 
average for the same reason as above. Thus, we have 


aver {Jz} = J3* + J3**. 
We can go on with 
Je = + 25 — — tm)! 
— + — te — — — Te + Im)*] 
but we leave the proof that 
aver {J4} = + 


to the reader. 
There is some interest in expressions with more complicated rela- 
tions, so that we may consider 


J. YF 


: 
ig 
& 
, 
a 
4 


1950 


ual 
to 


SAMPLING SIMPLIFIED 507 
If we write this for the pairs, we find 


Jn = + + yo) = + + yy; 


nr 


The coefficient of J:*J/:** is complex, and so we inquire into its causes. 
To do this, let us write out J:, J2, and so on, in terms of the brackets. 
We find 


J, = n(1), 
= 2n(n — 1)({2) — (11)), 
Js = 6n(n — 1)(n — 2)((3) — 3(12) + 2(111)), 
J, = 4n(n — 1)(n — 2)((4) — 3(22) — 4(13) + 12(112) — 6(1111)). 
Clearly we can suppress the apparently useless factors and introduce 
k, = (1) 
kp = (2) — (11) 
ks = (3) — 3(12) + 2(111) 
ky = (4) — 3(22) — 4(13) + 12(112) — 6(1111) 
and have 
aver {kp} = k,* + ky**. 


Our relation for Ji: becomes, when we go to 


ky = (11), 
1 2 
aver {ku} {J. n} = ku* (nk,*) (nk,*)* + kuy** 
n(n — 1) n? 
= ky* + 2ky*k,** + 


a pleasantly simple relation. 
We can extend such relations by defining 


ky. = (12) — (111), 
kan = (111), 
keg = (22) — 2(112) + (1111), 
kis = (13) — 3(112) + 2(1111), 
kin = (112) (1111), 
kun = (1111), 


)3 


508 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 
when we find 

aver = + ky*k,** + ky**k,* + ky.**, 

aver {Kygr} = + + + 

and so on. The various k’s, then, are the expressions with simple be- 
havior for randomized sums. 
SAMPLED SUMS 


We are now prepared to put the ideas of randomized sums and 
sampling together. Let us consider a population of z’s of size N’, and 
a population of y’s of size N’’, and let us first draw a sample of n from 
each and then form n sums by random pairing. We are interested in 
the average over all samples and pairings. Thus, we want to find 

ave {aver {k,}} = ave {k,* + k,**} = ky’ + kp’ 


where * and ** refer to the two samples of n, and ’ and ” refer to the two 
populations of N’ and N’’. We have used the formula for aver {k,} 
and the fact that k,* and k,** are linear combinations of brackets and 
so are inherited on the average. Similarly 

ave {aver {kyo} } = + + + 


and so on. 

These formulas for sampling combined with random pairing include 
the results of the previous sections as ‘the special cases (i) all y’s=0, 
(ii) N’=N"’=n. 

The k’s are very nicely behaved expressions. We have given formulas 
for them in terms of the brackets, but it is generally slightly simpler 
to work with the reversed formulas, namely 


(1) =k 
(11) = ku 
(2) = ku + ke 
(111) = kin 
(12) = kin + his 
(3) = kin + 3ki2 + es 
(1111) = kun 
(112) = kun + kuz 


§ 
f 
a 
; 
| 


— 


SAMPLING SIMPLIFIED 509 
(13) = kun + 3kue + his 
(22) = kun + 2ku2 + kee 
(4) = + + 4his + + Ka. 


Notice that all signs are positive, and that coefficients are slightly 


smaller. 
If we want to find the k’s from the brackets, we can work down in 


order using, for example 
ks = (8) kin — 
COMPUTATION 


If we are going to be interested in the k’s, we need to be able to 
calculate them conveniently. This we can do more simply with the aid 
of further formulas. We now introduce the first four power sums 


Si = = n(1) 
= 2? = n(2) 
Ss = x? = n(3) 
Se = x4 = n(4) 
which are the natural quantities to compute first, and observe that 


n n n 


= (2) + (n — 1)(11) 


so that 
(11) = (Si(1) — (2))/(n — 1). 
Similarly we find that 
(12) = (S:(2) — (3))/(m — 1) 
(13) = (Si(8) — (4))/(n — 1) 
(22) = (S2(2) — (4))/(m — 1) 
(111) = (S(11) — 2(12))/(n — 2) 
(112) = (S1(12) — (13) — (22))/(n — 2) 
(1111) = (8,{111) — 3(112))/(n — 8). 


These formulas will lead us to a simple calculation. The algebraic 
form is given in Table 1 and a numerical example is given in Table 2. 


510 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


TABLE 1 


ARRANGEMENT OF COMPUTATION 
(ALGEBRAIC FORM) 


f fz jn fa fx! 


fi fits fiz? 


ke ku = (11) (2) 
ks ke kan = (111) (12) (3) 
kuz kun = 
koe (22) 


TABLE 2 
NUMERICAL EXAMPLE OF COMPUTATION 


f 


197 875 2537 11781 


1.9700 
4.919 3.8307 8.7500 
—11.403 9.806 7.3494 17.1554 25.3700 
7.274 { —22.526 19.307 13.8991 33.2058 opened 117.8100 
23 .633 76.1458 


The ritual of calculation is as follows: 


(i) Calculate the power-sums S,,S2,S3,S4 by any of the stand- 
ard procedures (just as you were about to calculate moments). 
(ii) Divide each power-sum by n, entering the results (1), 


4 
Zp fo Sot» Srtp 
sums n Si S2 Ss Ss 
z | fz fz? 
9 —27 81 —243 729 
—2 4 16 —32 64 
0 2 0 0 0 0 
s aoe 1 10 10 10 10 10 i 
‘a 3 30 90 270 810 2430 
ee 4 15 60 240 960 3840 
5 35 175 875 4375 


SAMPLING SIMPLIFIED 511 


(2), (3), (4) in a descending diagonal (these are the moments 
about zero). 


197 


117.8100 = ——. 
100 


(iii) Multiply each value in the top diagonal in turn by S), 
subtract the next entry to the right and divide by n—1 (one 
machine operation with division by an integer), enter in the 
second diagonal 


197(1.9700) — 8.7500 


3.8307 = 
99 
197(8.7500) — 25.3700 
17.1554 = 
99 
197(25.3700) — 117.8100 
49.2937 = 99 


(iv) Multiply (2) by Se, subtract (4), divide the result by 
n—1 and enter just below the last entry, joining them with a 
bracket 


875(8.7500) — 117.8100 


(v) Take the top entry in the second diagonal, multiply by 
Si, subtract twice the second entry in the diagonal, divide the 
results by n—2 and enter in the top of the third diagonal. 
197(3.8307) — 2(17.1554) 

98 
(vi) Take the second entry in the second diagonal, multiply 


by Si, subtract the two bracketed entries to the right, divide by 
n—2 and enter 


76.1458 = 


7.3494 = 


1950 
1.9700 = — 
100 
875 
8.7500 = — 
100 
2537 
in 25.3700 = —— 
100 
7 11781 
99 


512 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 
197(17.1554) — 49.2937 — 76.1458 


98 


(vii) Take the top entry in the third diagonal, multiply by 
Si, subtract three times the other entry, divide by n —3 and enter 
197(7.3494) — 3(33.2058) 

97 


(viii) Subtract the central column from the first column to 
the right and enter in the first column to the lefi 
4.919 = 8.7500 — 3.8307 
9.806 = 17.1554 — 7.3494 
19.307 = 33.2058 — 13.8991. 


33.2038 = 


13.8991 = 


(ix) Subtract the central column and three times the first 
column to the left from the second column to the right, enter 
in the second column to the left. (When working with the lower 
bracketed entry subtract only twice the first left column.) 


— 11.403 = 25.3700 — { 7.3494 + 3( 9.806) } 
— 22.526 = 49.2937 — {13.8991 + 3(19.307) } 
23.633 = 76.1458 — {13.8991 + 2(19.307) }. 
(x) In the last line, subtract center entry, siz times first left 
entry, four times upper bracketed entry, and three times lower 


bracketed entry from the extreme right entry, and enter at 
extreme left 


7.274 = 117.810 — {13.8991 + 6(19.307) 
+ 4( — 22.526) + (23.633) }. 


This routine is systematic, working first down diagonals, and then 
across columns in an easily remembered pattern. No numbers need be 
written except those which are of possible use. 

The procedure illustrated in Table 2 of carrying two extra decimals 
for the brackets and one extra decimal for the k’s is recommended as 
a guard against accumulation of round-off errors. The final results 
would be, in this example 


ky = 1.97 ks = — 11.40 
ke = 4.92 ky = 7.27 


= 
~ 
4 
q 
q 
4 


SAMPLING SIMPLIFIED 

ku = 3.83 kin = 7.35 

kz = 9.81 kuz = 19.31 

kis = — 22.53 kun = 13.90. 

kee = 23.63 
A considerable check on all the computations may be had by verifying 
hat 

ky = (4) — 3(22) — 4(13) + 12(112) — 6(1111) 


1 
ku = — — ky 
n 


2 
kin = kuki — — kis 
n 


2 
kur = kuke — — kis 
n 


3 


kun = Ku. 
n 


These check formulas can be derived by direct expansion. 


INFINITE POPULATIONS 


The classical theory of sampling was concerned with sampling from 
infinite populations. Thus, we shall come to more familiar ground if 
we consider what happens to the brackets and the k’s when the 
population becomes infinite. The formulas we used to compute the 
brackets numerically can be written as follows: 


n 1 
(11) = (ly — (2) 


n 1 
(12) = (1){2) (3) 


y 
r 
0 
1 
ky. = kik, — — keg 
n 
1 
kis = kyks — — kg 
n 
n—1 1 
= ——(ki-—-—k 
kas 


514 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


(22) = —"_ - 
(111) = — (12) 
(112) = (1y(12) (13) (22) 


n 1 1 
{112) = (1){12) — (13) — (22) 


n 3 
(1111) = {1){111) (112). 


When we !et n tend to infinity, and notice that (p) is always just the 
pth moment about zero, we see that: For an infinite population, each 
bracket reduces to the product of the moments about zero of orders specified 
in the bracket. Thus, for an infiinite population, 


(112) = (1){12) = (1){1)(2) = = 
And when we consider the k’s, we find that, for an infinite population 
Kpgr = KpKgkr, 


where the x’s are the seminvariants (Thiele) or cumulants (Fisher). 

In dealing with brackets and k’s we are dealing with those expressions 
for finite sets of numbers which are entirely analogous to moments 
about zero and products of moments about zero, to cumulants and 
products of cumulants. If (pq), kpgr, and so on refer to a sample, while 
tp, Kr, and so on refer to the infinite population from which the sample 
was drawn, then, for example, 


(12) is the unique symmetrical polynomial expression whose 
average for all samples is ie, 

kin is the unique symmetrical polynomial expression whose 
average for all samples is x°, 


In view of this, and of the inheritance on the average and sample sum 

relations already considered, it is a question why conventional moments 

should be calculated for samples or finite populations. Time will tell. 
Since, for example, the formulas 


> 
| 
ate 
bh 
; 


he 
ch 


SAMPLING SIMPLIFIED 
(1) = 
(2) = ke + ku 
(12) = ki + kin 


must reduce for the case of an infinite population to 


Mi = 


Me = ke + Ky? 


= Kiko + = + 


we can easily find their coefficients. The single-index brackets have the 
coefficients for moments in terms of cumulants (given numerically by 
Kendall [4, Section 3.13] up to the 10th moment). The coefficients of 
brackets with several indices can be found by formal multiplication. 
Thus the conversion formulas between brackets and k’s are available 
for any reasonable order. 


THE VARIANCE 


There is no question about 
k, = (1) = &, 


it is what we have always used. But when we come to 


1 
ky = (2) — (11) = (x; — 


we find that we are led to divide by n—1 (and not n) whether dealing 
with a sample or a finite population. The writer is one of those who 
has chosen for some time to define the variance of any finite set of num- 
bers as ke, whether the finite set is a sample or a population. It simplifies 
formulas, as we shall see. He feels that the results above, which embed 
the nice properties of kz among the nice properties of the other k’s 
strengthens the case for doing this. 

There is one small price to pay for simplicity here. Samples of size 
n from a population of size N should be taken as an infinite popula- 
tion, made up of (¥) equally probable kinds of samples. This may 
seem strange, but we shall endeavor to show that it is really logical. 

The most obvious characteristics of a finite population are these: (i) 
If we draw a sample of elements, no value can appear more than a 
certain number of times. (ii) We cannot draw samples of larger than a 
certain size. Now consider drawing a sample of samples from a finite 
population. As ordinarily understood, this means setting up the finite 


: 
ns 
ts 
id 
le 
le 
se 
m 
1. 


516 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


population and drawing a sample, setting up the original finite popu- 
lation again and drawing another sample, and so on. We can draw as 
many samples as we like, and (with very small probability) they can 
come out all alike. Clearly the population of samples is infinite! We 
shall adopt this understanding, and it will not cost us much except the 
mental stress of distinguishing between a finite populations of elements 
and the infinite population of samples of one which may be drawn from 
it. 
SAMPLING VARIANCES 


Now we shall use the machinery at our disposal to obtain some re- 
sults on variances and covariances in sampling of sample means and 
sample variances. Three of the formulas which were suggested for 
checking the numerical computation of the k’s can be written. 


1 
=ku+—k 
n 


1 
kis + —k; 
n 


2 


1 
ka? = + + 
n 1 


We shall deduce our results from these, going back to a sample of n 
from a population of N and distinguishing the population k’s with 
primes. 

Then 


var = ave {k,*} — (ave {ki})? 


1 
ave {i + — (k,’)? 


Thus we obtain the formula, well-known to all who always divide by 
n—1 in finding a variance—the variance of the sample mean is the re- 
ciprocal of the sample size minus the reciprocal of the population size, all 
multiplied by the population variance. 


- 
|. 
ig 
4 
> 
4 
4 
= 
By 
= ky’ + — ke’ —( kn’ + 
| 
n N 
¥ 
& a 
: 
‘ 
i 
3 
; 
: 


SAMPLING SIMPLIFIED 
Starting from the next two formulas, we find that 
cov {ki, ke} = ave {kike} — (ave {ki})(ave 


= ave {i + ~ is} — 


1 1 
= ha! + — (kn! + 


(— la’, 
var {kz} = ave — (ave {he})? 
— (ha)? 


1 
owe — ha + 
nr 
1 2 
n 


(a! + + in’) 
 N-1 


1 1 2 2 
(- N n—-1 


The simplest form of the last formula (giving the variance of the 
sample variance for finite populations in terms of population k’s) 
which I have seen elsewhere appears more complex than this because 
it was expressed in terms of k,’ and (k2’)?. Using k22’ instead of (k2’)? 
makes the result simple, and since our computational routine calcu- 
lates kz. as a step in finding ky, it is as easy to use k22’ as (k2’)?, perhaps 
easier. 

When dealing with sums drawn from one population of size N’ and 
another of size N” we have 


1 
var {ki} = ave {i { ky’ 


1 1 
ky’ + + ky” +— + ky” ky’ 
n 


ke! — — ky!” ke!’ 


517 

1950 
opu- 
Ww as 

can 

; the 
ents 
rom 
2 re- | 
and 
of n 

re- 

all 


518 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 SA 


(- + (— 1 ) 

n n WN" 

and the analogous formulas for cov {ki,k2} and var{k2} follow in the (2 

same way. 
The reader can easily extend these derivations to related results. [i 


OPTIMUM ALLOCATION 


A classical problem connected with finite populations is the optimum 
allocation of samples among various strata. If the 7th stratum contains 
N; elements, has variance o,?, and is sampled to the extent n;, then 
the variance of the sample mean for that stratum is 


and the variance of the grand mean is 


1 1 
{x we(— oth N;)? 
nN; N 
which can be written 
1 

—— > Ne?>, 

\ 
and to minimize this by the choice of n; we need only minimize 


N;20;? 
>» 


nN; 


If we are restricted by 
=n 

we get the condition 
N,20;? 


and so we should choose n; proportional to the product of N; and o;. 


Notice that there are no special adjustments when we divide by N;—1 
in defining o,’. 


as 
Wee 
1 1 
N 
ES. 
4 
| 
q 


950 SAMPLING SIMPLIFIED 519 


REFERENCES 


[1] Oskar N. Anderson, “‘Einfilhrung in die mathematische Statistik” Vienna, 

Springer, 1935. 
he [2] R. A. Fisher, “Moments and product moments of sampling distributions” 

Proc. London Math Soc.. (2) Vol. 30 (1930) 199-238. 

[3] J. O. Irwin and M. G. Kendall, “Sampling moments of moments for a finite 
population” Annals of Eugenics Vol. 12 (1943-45) 138-142. 

[4] M. G. Kendall, “The advanced theory of statistics” Vol. 1 London, Charles 
Griffin, 1943. 


ns 


on 


THE EFFECTIVENESS OF QUALITY CONTROL CHARTS* 


A. AROIAN 
Hunter College 
AND 
HowarpD LEVENE 
Columbia University 


The spacing and effectiveness functions of a quality control 
chart used either alone or in sets of two or more are derived for 
production at a constant level and for erratic production. The 
spacing of decision points is considered from a general point of 
view. The theory developed is fundamental in deciding which 
of two different decision techniques in quality control, each 
using the same spacing function, is the more effective. 


I. INTRODUCTION 


N THE CLASSICAL theory of testing hypotheses, there is a certain 

{ cause system operating, a sample of the effects of this cause system 

is taken, and a decision is made about the nature of the cause system. 

In the classical theory the size of this sample is determined in advance, 

while in sequential analysis the sample values determine the size of the 
sample needed to reach a decision. In either of these cases a decision is 
reached once and for all and no further observations need be made. 
Still another type of test often arises in quality control work, and it is 
the purpose of this paper to extend some of the principles of the classi- 
cal Neyman and Pearson theory to this type of test. In this type of 
test, which we will call a continuing test, observations are taken in time 
sequence, and at regular intervals a decision is made to follow a certain 
course of action. While this decision is based on the observations, it 
does not involve the termination of observations, which are continued 
indefinitely. For example, in quality control, an observation is made 
at stated intervals. This observation may be one or more measurements 
on one or more units of production. As a result of this observation, one 
of two actions is taken. Either we say production is in control, and pro- 
duction is allowed to continue, or we say production is out of control, 
and production is temporarily halted to look for trouble. In either case 
production ultimately resumes and further decisions are made. This 
procedure continues as long as further product is desired. In order to 
have a rational criterion for choosing the continuing test to be used, 


* Many of the results given herewith were obtained independently by the two authors. 
520 


Pay 
at 
4 q 
Be: 
f 


QUALITY CONTROL CHARTS §21 


the long run economic effects of these tests must be determined mathe- 


matically. 
II. GENERAL THEORY 


First, suppose that production is in control. We then desire to say 
that it is in control and to continue production. Nevertheless there will 
usually be a probability a at each decision point of saying that is out of 
control. Then when we are in control, production will be stopped un- 
necessarily once in every p=1/a decision points in the long run. This 
is a necessary part of the cost of inspection. Evidently the frequency 
of such interruption depends on @ and also on the interval between 
decision points. With more frequent decision points a smaller a would 
be desired and hence no one value of a can be universally used. 

We will always suppose in this paper that the process was originally 
in control. Suppose now that at a certain moment the process ceases to 
be in control, and that it will stay out of control until remedial action 
is taken. Clearly, we want to recognize the lack of control and take 
action as soon as possible. The longer action is delayed the more costly 
it is, and we will give several related measures for the cost. 

Consider the following simple situation. We have chosen a continuing 
test procedure that gives probability a of taking action at each decision 
point. Suppose that the process suddenly goes out of control in such a 
way that there is a constant probability y of taking action at each de- 
cision point until remedial action has been taken. (We will consider be- 
low the case where this probability varies from one decision point to the 
next.) Then the decision to take action will be taken at say the Nth 
decision point. Evidently the probability that N=WNp is the proba- 
bility that we fail to take action at the first No—1 points and do take it 
at the Noth, or 


(1) Prob {N = No} = f(No) = (1 — y)¥ey. 
Similarly, 

(2) Prob {N < No} = F(No) = 1 — (1 — )¥*. 
From (1) we find the moment generating function of N 


(3) Mult) =X — = 
Nel — (1 — 


From (3) we find the mean and variance of N 


(4) E(N) = my = Zz. 
Y 


| 
r 
f 
i 


522 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 

(5) on? = 

The higher moments and cumulants of N can easily be found, and in 

fact the r-th cumulant is 


(6) = y (1 1)” r>l1, 
vel 

a series easily summed by finite differences. It should be noticed, how- 
ever, that no useful purpose is served by exhibiting the moments of 
higher order than the first. Moments are used either to specify the 
parameters of an exact distribution or to aid in fitting an approximate 
distribution; in the present case the probability distribution function 
f(N) and the cumulative distribution function F(N) are already known 
and are of a particularly simple form. Furthermore, it is clear from (1) 
that f(N) is a decreasing function of N, or the distribution of N is J 
shaped, and hence the exhibition of the standard deviation (which is 
always unconsciously referred to a normal distribution) may be mis- 
leading. 

Equation (1) we call the individual effectiveness function and (2) the 
complete effectiveness function. The same equations with a substituted 
throughout for y we call the individual stoppage spacing and complete 
stoppage spacing functions respectively. The value 1/a is the average 
stoppage spacing number (A.S.S.N.) and 1/7, the average efficiency 
number (A.E.N.). 


III. RELATION TO NEYMAN AND PEARSON THEORY 


The quantity a defined above corresponds to the level of significance 
in the classical theory of testing hypotheses; i.e., if the process is in 
control, a is the probability of making a Type I error and saying we are 
out of control at any given decision point. We could thus consider a@ in 
setting up our test criterion; but we have seen that when we are in 
control we will stop the process once in every p=1/a decision points 
on the average, and this seems to give a more intuitive measure of the 
cost incurred. Similarly, if we are out of control, y corresponds to the 
power of the test and 1—y is the probability of a Type II error. If we 
let our thinking be guided by the classical theory we will want 7 to be 
greater than 3, so as to have a reasonable chance of stopping the proc- 
ess. In order to achieve this we may have to make a fairly large, but 
then p will be small and the cost of frequent interruptions may be 
excessive. On the other hand, if we think in terms of continuing tests, 


BS 
ihe? 
3 7 
4 
ita 
4 


QUALITY CONTROL CHARTS 523 


we realize that if we do not stop the process at the first decision point 
after going out of control, we can still do so at the second, or third, etc. 
It is then reasonable to think in terms of stopping the process at the 
Nth decision point after going out of control, and to require that N be 
reasonably smail in some sense. We may either consider the entire dis- 
tribution of N obtained above, or confine ourselves to some single 
numerical measure. One reasonable measure is the expected value of N, 
1/y. Another is the number N,, the smallest integer such that 


(7) Prob {N > N.} S«, 


where ¢ is some suitably chosen small constant. Since 
Prob {N > N.} = (1 — 


we have 
log (1 — 7) 


and we take the smallest integral N. satisfying (8). 

As we said above, itself is not an intuitive measure of the effective- 
ness of our procedure. The numbers 1/y and NV, are monotonic func- 
tions of y, and so are equivalent to it from a theoretical point of view. 
But they do have simple intuitive interpretations, 1/7 being the average 
delay in detecting loss of control, while for e=.01 say, we can state that 
with 99% certainty we will have detected the loss of control by the 
N «ith decision point. 


(8) 


IV. APPLICATION TO STANDARD CONTROL CHART PROCEDURES 


The above theory can be applied at once to those cases where a 
single control chart is used. For example, if we have a fraction defective 
chart, with p the true fraction defective and j the observed fraction in 
a sample of n, we may wish to accept the process if p= po and reject it 
if p> po. If we say we are out of control when p= potAWpo(1—po)/n 
we can easily calculate a. Suppose that the process goes out of control 
with p=pi1> po, we can calculate y and apply the above theory. We 
may choose n so that the cost of sampling is not too excessive and then 
choose \ so as to compromise between the cost of frequent stoppage 
measured by 1/a, and the cost, as measured by 1/y or N., of continuing 
for N decision points after p has changed to some feared alternative 
value 

Sometimes two or more charts are run simultaneously (for example, 
charts on mean and standard deviation of some measured product). 


050 

of 

he 

on 

mn 

1) 

J 

is 

he 

te 

ze | 

in 

re 

in 

in 

ts 

le 

e 

it 

e 

3 


§24 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


We say the process is out of control if either chart is out of control. The 
above analysis holds if a and y represent the probability that we say 
the process is out of control, i.e. that at least one of the charts will be 
out of control limits. If a; and a, are the probabilities for the separate 
charts we know that max(a:, a2) SaSai:+a~. Furthermore, if the two 
charts are independent, as are mean and standard deviation or mean 
and range if our measurements are normally distributed, then a=a, 
+a2—aiae. If the two tests are not independent, a more complicated 
investigation may be needed to find a. The above results apply to y 
without any change, and can easily be extended to the case of more than 
two control charts. It would seem from these considerations, that a 
and a2 should be somewhat smaller than they would be chosen if only 
one chart were used. 


V. ERRATIC PRODUCTION 


It has been supposed above that once the process has gone out of 
control there was a constant probability y of taking action at each de- 
cision point. We now suppose that the probability of taking action at 
the ith decision point is y;, where the 7; are given constants. The prob- 
ability that the process will be stopped at the Nth decision point is then 


N 
(9) = (1 — 
and the probability that we stop not later than the Nth observation is 
N 
(10) F(N) =1-][Q- 7). 


The mean of N can be written as an infinite series that is not easily 
summed, so that it will probably be easier to use the 100(1—.)th per- 
centage point defined above as N.. 

Since the expression (10) is not too easy to evaluate in a simple 
mathematical form, it may be helpful to give upper and lower bounds 
for it. Let 


12 
=_— 
(11) YN 


Then, 
N 


N 
(12) F(N) =1-J]] 


| 
. 
4 
ee 
Tar 
. 


QUALITY CONTROL CHARTS 


Since (12) is an alternating series we have 


N 
F(N) Divi = 
F(N) 2 (1 — Yn)", 
F(N) 2 max (vi), 


as bounds of varying degrees of usefulness. In a numerical case, since 
f(N) is readily evaluated, m, is easily determined as accurately as one 
might wish. 

We obtain a case that is at once very simple and of fairly general 
applicability, if we suppose that y;, the probability of the ith decision 
point indicating lack of control, is itself a random variable. More pre- 
cisely, we suppose the sequence y; is a sample of independent observa- 
tions on a random variate y which is distributed on the interval (0, 1) 
with a completely arbitrary elementary probability distribution g(7). 
Then the probability of rejection at the ith decision point is 


(13) My = vg(v)dy 


if y has a continuous distribution, or the corresponding sum if y is 
discrete. Then since the 7; are independent, the formulas of Sections II 
and III hold with my replacing +. 

This last model applies to erratic production, where a machine may 
be more or less out of adjustment at various times, provided there is a 
constant probability of any degree of lack of. adjustment, and that this 
is independent of the situation at the previous decision point. 

It should be noted that if, for example, we have a measured charac- 
teristic that is normally distributed with mean yp, and y itself has a 
probability distribution g(u), we do not substitute the average yu in our 
formulas, but average the y values corresponding to the different 
values of u. Naturally, the theory of this section takes care of produc- 
tion at a constant level with changing sample sizes. The case of erratic 
production for two or more charts follows along similar lines and will 
be given in detail in a future paper. 


VI. THE SPACING OF DECISION POINTS 


The usual type of control chart procedure is the following. We pro- 
duce one unit per minute, and every m minutes we make a decision 
based on the last & units produced. We will restrict ourselves, as is 


525 
1950 
ay 
be 
ite 
vO 
an 
Q) | 
Y 
an 
ay 
ly 
of 
ut 
y 


526 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


customary, to the case k<™m, since if k>m, the samples are not inde- 
pendent and there are mathematical difficulties. In the preceding sec- 
tion it was tacitly assumed that m and k were fixed in advance. We now 
consider some principles for choosing m and k. 

When m was fixed, the costs of wrong decisions depended on the 
number of decision points passed before stopping the process. However 
the actual costs depend on the number of items produced rather than 
the number of decision points, and when the spacing of decision points 
is not fixed in advance, we must consider the number of items pro- 
duced. We suppose then that a is chosen so that when the process is 
in control we will stop it once every c items on the average. Then we 
must stop once in every c/m decision points, or 


(14) a = m(1/c). 


If m=c we can take a=1, or stop the process after every c items, in 
which case sampling is unnecessary and y=1 also, giving M=c, where 
M denotes the number of unzts produced from the time the process goes 
out of control to the time we stop production. In general, E(M) 
=mE(N)=m/y. Since m/a=c is a constant, it is reasonable to choose 
m so as to make E(M) a minimum, which is equivalent to making 
(m/y)/(m/a) &@ minimum, or a/y a minimum. Now as m becomes 
smaller, a becomes smaller and y, which for a fixed alternative to con- 
trol and fixed k depends only on a, decreases also. Now, if we are using 
a most powerful test at each decision point, it follows from the funda- 
mental lemma of Neyman and Pearson theory that a/y is a monotonic 
non-increasing function of a; and hence in general E(M) decreases as 
m decreases and k is the optimum value for m. Most of the tests in 
common use for control charts are either most powerful tests or related 
to them in such a way that the same conclusion will hold. This result is 
intuitively plausible, since we are holding k fixed, and hence, the more 
frequent the decision points are, the more observations we take per unit 
time, and the sooner we will have enough evidence of lack of control to 
stop the process. 

We have thus seen that for every fixed k, the best procedure in the 
above sense is to make a decision at every kth unit, based on all the 
units produced since the last decision. Under these conditions the 
proper choice of k is a more complicated problem than was the choice 
of m, and we will consider it only for a specific case. Consider a control 
chart for means. When the process is in control z is normally dis- 
tributed with known mean and variance which we suppose to have 


i i 
al 
; 
2 
a 
| 
j 
q 
> 
aor 
= 
4 


QUALITY CONTROL CHARTS 527 


been made equal to 0 and 1 respectively by a suitable transformation of 
variables. We suppose that the process goes out of control by the true 
mean of x becoming 4» ~0 where yz is a given constant. (This is usually 
known as the process going into control at a new level.) 

Let 


(15) = 


be the elementary probability function of a standardized normal varia- 
ble, and 


(16) = f $(t)dt 


be the corresponding cumulative distribution function. 
Let %; be the arithmetic mean of the ith group of k observations, 
then we stop the process if 


(17) | = 
where ), is defined by the equation 
(18) 1 + Ax) = Qk, 


where a,=k/c, and c is a constant. The power of the test is y, and is 
given by 


(19) ve = 1— — wk) + — 


and E(M)=k/yx. Under these conditions the minimum value of E(M) 
will usually be reached for some value of k that is greater than 1 but 
less than c. This value of k will depend on the value of yu, and hence this 
line of reasoning does not lead to a unique choice of k, but merely to 
general principles to be considered in choosing k. 

The following numerical example will make the above ideas clearer. 
Suppose that when the process is in control we wish to stop it once in 
every 1,024 units on the average. We stop when |#| =\./k. Table 1 
gives the values of \./+/k, E(M) and M 5 (a number such that Prob- 
ability {M> M 5} =.05), for various values of u and k. 

One of the most striking things in Table 1 is the wide range of 
values of \;/+/k corresponding to the same cost of stopping production 
when in control. Since the quantity \,/+/k corresponds to the multi- 
plier of sigma in the usual terminology of 2 sigma limits, 3 sigma limits 


950 
le- 

ne 
er 

ts 

e 

n 

) 


528 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


etc., it becomes clearly evident that there can be no such thing as a 
universally best multiplier to use. The choice must be based on sample 
size, frequency of decision points and the cost of unnecessary stop- 


pages. 


TABLE 1 
k Ae/VE E(M) 
el 1 3.29 943 2828 
ol 4 1.443 864 2576 
el 9 -874 758 2213 
1 100 - 1656 384 995 
1024 0. 1024 1024 
1.0 1 3.29 91 270 
1.0 4 1.443 21 58 
1.0 9 -874 13.9 26 
1.0 100 - 1656 100 100 
1.0 1024 0. 1024 1024 


The above model applies to the case where expensive items are being 
produced, and the cost of inspection is negligible. However, the same 
mathematical model with only slight changes, applies to a quite differ- 
ent situation. We now suppose that sampling and testing is relatively 
expensive and that only a fixed proportion of the units can be tested. 
The problem is then whether to divide them up into one large sample 
or many small ones. Because we sample a fixed fraction of the units we 
can measure costs of wrong decision either in terms of number of units 
produced or number of units tested. But if we work in terms of number 
of units tested, we are testing all units considered, and the theory ob- 
tained above for unlimited testing applies, with only a change in the 
interpretation of M and k/a. 

In this section we have tacitly assumed that the process goes out of 
control immediately after a decision point. If the process goes out of 
control somewhere between decision points, the probability of stopping 
at the first following decision point will be less than y. This will some- 
what alter the numerical results, but the effect will not be serious if 
1/y is greater than two or three. 


VII. REFERENCES 


For an outline of the classical Neyman and Pearson theory, the 
reader is referred to H. Cramer’s Mathematical Methods of Statistics 
(Princeton University Press) or to M. Kendall’s The Advanced Theory 
of Statistics, Vol. II (C. Griffin and Co., London), where the original 
papers are also cited. 


: 
a 
4 
a 
> 
4 


QUALITY CONTROL CHARTS 529 


John M. Howell (Control Chart for Largest and Smallest Values, 
Annals of Mathematical Statistics 20, 305-309) has compared the simul- 
taneous use of charts for mean and standard deviation with the use of 
the chart based on the largest and smallest value, using the relative 
values of y and N .q (in our notation) for the two procedures. However 
he did not give a general theory. 


1950 
as a 
nple 
top- 
ing 
me 
ly 
d. 
le 
ve 
ts 
er 
ie 
f 
of 
f 

} 


TWO-CHOICE SELECTION 


Irwin Bross 
Johns Hopkins University* 


In many practical applications we wish to select one product 
(commodity, plant variety, technique, etc.) from a set of 
alternative products. When there are only two products to be 
considered, this will be called the two-choice selection prob- 
lem. Several alternative rules are derived for determining 
sample sizes, i.e., the amount of data on which the selection 
is to be based. 


I, SELECTION MODEL 


HEN WE ARE required to make a choice between several products, 

we may wish to subject the competing products to trials or tests. 

If quantitative measures of performance are the basis for selection, we 
face a statistical problem. 

We consider here a simple special case of the problem. Two products 
are under consideration and the procedure for selection is already speci- 
fied. The two products are equally expensive, and there is no reason, 
on the basis of previous experience, to favor either product. 

For the sake of definiteness let us suppose that light bulbs from two 
manufacturers, A and B, are being tested. In the test we measure the 
service-life of the bulbs for n bulbs of each manufacturer, and obtain 
Z, and Zz, the average service-lives of the two kinds of bulbs. 

We then calculate y= %4 —Zz, and if y is greater than or equal to zero, 
we purchase A-bulbs. 

This particular selection scheme follows from the common sense 
dictum: “Choose the product with the best performance.” It may also 
be justified mathematically [2]. 

We will not be concerned here with the scheme itself but only with 
the one quantity which is at the disposal of the experimenter, n, the 
number of bulbs (of each kind) tested. The question we would like to 
answer is “How large should n be?” 

Evidently if n is large, our chance of picking the better product will 
be improved, but on the other hand our testing program will be ex- 
pensive. The advantage derived from additional informaticn must be 
balanced against the cost of that information. 


* Department paper No. 264. Part of the work was done under Navy Contract N6onr-24314. 
1 A point set theory justification was provided by Referee A. 


530 


ae 
age 
i 
ad 
{ 
ne 


TWO-CHOICE SELECTION 531 


Several different ways of making this balance will be examined in the 
sequel. First, however, some additional assumptions will be made in 
order to obtain a simple mathematical formulation of the problem. 

Let us assume that y is normally distributed with mean 6 (the differ- 
ence of the true means of the A and B bulbs) and variance o,?. Let 
and Zz be independently distributed with the same variance, o?/n, so 
that 

20? 
(1.01) =—- 
n 

This is not as drastic an assumption as it might appear to be. If the 
distribution of the service-lives for A bulbs differs from that of B bulbs 
only in the location parameter (i.e. mean) then y is symmetrically dis- 
tributed with variance given by (1.01). For moderately large values of 
n, the distribution will not differ much from the normal prototype un- 
less the distributions of service-lives are extremely skew. Even then a 
simple transformation (for example to logarithms) may make a normal- 
ity assumption tenable. 

In order to obtain the simplest possible model we shall further assume 
that o? is known. In practice, of course, one would use an estimate of o? 
to obtain an estimate of the sample size n. The results of Section IV in- 
dicate that an approximate value of 7 is all that is really needed. 

This special case has some practical importance and also provides 
insight into the nature of the new pragmatic statistical methodologies 
which are becoming popular [1]. Pragmatic methodologies emphasize 
costs and losses rather than probabilities. 


II. CONTROL OF PROBABILITIES 


At first glance the balanced two-choice selection problem greatly 
resembles the problem of testing the significance of the difference be- 
tween two products. However, there is a different purpose in selection 
than in hypothesis testing. 

If, in fact, there is no difference between the two products, then we 
cannot make a “wrong” choice, so we do not have to be protected 
against erroneously believing that a difference does exist. If, on the 
other hand, a real difference exists between the two products, it does 
not worry us if we fail to establish its statistical significance. 

It is possible, however, to continue to think in terms of probabilities 
by injecting the cost considerations in a “concealed” form. For any 
given value of 6 it is not difficult to calculate the probability of making 
the wrong choice. Since the problem is symmetric with respect to 4, this 


ict 
of 
be 
b- 
ng 
on 
8, 
Ss. 
ts 
1, 
e 
n 
e 

) 

l 

> 


532 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


quantity may be considered positive without loss of generality. The 
wrong choice will be made if y= Z,—Zz is negative. Let 


Oy 
Jin 


then the probability that we will make the wrong choice, P., is 


Pe = Ply <0) = +8 <0) = P(t< 


FY 
Cy o 2 

With the aid of a normal integral table the probability that we will 
make the wrong choice may be calculated for various values of 5, , and 
n. It is convenient to measure 6 in terms of ¢, or in other words to con- 
sider 0= 5/c, and use this in tabulating P.. A table of P.. has been 
constructed by Churchill Eisenhart [3]. 

We cannot determine n directly from (1.02) because of the presence 
of the unknown quantity, 5, but this difficulty can be circumvented 
as follows. We ask the purchaser to tell us how large a difference he 
would consider of practical importance. Suppose he names some 
quantity 59. We then ask him how sure he wants to be of making the 
right choice when a difference of this size turns up. Suppose he wants 
to make the right choice 95% of the time, more generally (100 —a)% of 
the time. 

If, as has been assumed, oc is known, then it becomes a very simple 
matter to find a value of n, say n*, which will satisfy these conditions. 
Substitute P,.=.05 in equation (1.02) and solve for a by means of the 
normal tables. Then a is about 1.66. Now since 


— 
2 


(1.02) 


where 


the desired value of n is 


rey 
= ig + 
“3 
& 
‘ad 
ic 
< 
4 


TWO-CHOICE SELECTION 


(1.66)?20? 
5o? 


(1.03) n* 


If the purchaser wants to catch a difference as large as the standard 


error, ¢, then 
n* = (1.66)?(2) = 6 


and six bulbs of each type should be tested. 

We can now argue that if §>59 we have an even better chance of 
making the right choice while if 5<6, we don’t worry too much because 
the difference is not important economically. 

The solution is very nice insofar as the statistician is concerned. It is 
quite simple and, better yet, the statistician has made no assumptions 
beyond those in the model. As the grateful purchaser leaves the 
statistician’s office, he may get some parting advice. It is up to him to 
determine 59 wisely and to remember that 6) depends on a number of 
factors, such as how many light bulbs are going to be purchased as a 
result of this test. The larger the purchase, the more important small 
performance differences will be. However, the statistician will hasten 
to add, if the purchaser wants to pick up small differences, the testing 
program will be expensive. 

While all this advice is sound, it does not hide the fact that the 
statistician has very cleverly handed the hard part of the problem right 
back to the purchaser! 


III. CONTROL OF LOSSES 


Statisticians have realized, almost reluctantly, that in applied prob- 
lems they have to be cost accountants as well. The necessity of dealing 
with costs in sampling surveys, agricultural field trials, and industrial 
inspection has led to the development of pragmatic methodologies.” In 
these methodologies a statistician does not merely provide tools for 
deriving statements from data (i.e. rejecting a hypothesis). He must 
deal with the possible lines of action and the cost-consequences of 
these actions. 

Let us consider the consequence of choosing the wrong product. If 
the service-life of a bulb is measured in hours, if the service is worth C; 
dollars per hour, and if N bulbs are purchased, then a wrong decision 
will cost $NC,5. The probability that we will choose the wrong product, 
P.., is given by (1.02). Hence the expected loss due to errors of our 
selection scheme is $NC,6P.. 


2] should like to acknowledge several valuable suggestions and criticisms made by Professor 
W. G. Cochran during the preparation of this manuscript. 


533 
ill 
nd 
n- 
en 
ce 
od 
he 
ne 
he 
ts 
of 
le 
Ss. 
1e 


534 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


It also costs us money to make the tests, say $2nC2, where C, is the 
cost of testing a light bulb. There may also be a constant cost, or over- 
head, in connection with the testing program, but this term can be 
neglected without loss of generality and the expected cost of the selec- 
tion scheme written as 


(1.04) C = Ni + 2nC2. 


While our general purpose is to choose n in such a way as to make 
the cost of selection (1.04) as small as possible, we cannot proceed 
directly from (1.04) because it involves the unknown quantity 6. 

Consider how the cost varies with 6 when n is fixed. As 6 approaches 
zero, the loss becomes small because the first term in (1.04) tends to 
vanish. As 6 becomes large, it will be seen from (1.02) that the loss 
becomes small again since P,, tends to vanish. At some intermediate 
point, therefore, the loss is a maximum, and this maximum point can 


be found by ordinary calculus since the cost is a continuous function. 
Thus 


ac 
=0= NC, — NC — fla) 
where 
Gy 
Hence 
10a = afta) 


This equation may be solved with the aid of a normal integral table, 
and this “worst” value of a, say a’, is equal to 0.75. Since 


Oy = 
n 
the maximum loss is therefore 


20 
Cmax = — (.75)NCio f(t)dt + 2nC, 


al 
2 
=e 
be 
t 
i 
Mes 


TWO-CHOICE SELECTION 535 


Since we want to control our loss, it seems reasonable to choose n*, 
our optimum value of n, so that this maximum loss will be as small as 
possible. This is not hard to do. 

First we note that if n=0 then Crax= ©, so that n=0 is not a solu- 
tion. Now n is not a continuous variable, it must be an integer, but an 
approximate result can be obtained by treating n as if it were con- 
tinuous. If our result is a fraction, a direct calculation of costs for the 
adjacent integers will determine which way to “round off.” Minimizing 
Cmax by calculus 


| +20, <0 
on 2° nil? 
(1.06) 
n* = =) 
C2 


The value of n* obtained by this method now depends on the costs 
and consequences of the selection scheme. The probabilities are not 
explicitly involved. 

For example, if N= 10,000; Ci=.01 ($/hour) ; C.=$.50 (per bulb); 
and c= 10 hours, then (1.06) leads to n*=25. In other words 25 bulbs 
of each kind should be tested. 

If we assumed that a difference of five hours of service-life is im- 
portant (i.e. we could lose or gain $500 by the test) and if we wanted 
to catch a difference of this size 95% of the time, then the concepts 
of control of probabilities and (1.03) would tell us to test 22 bulbs of 
each kind. 

Because we are minimizing the maximum loss the rule leading to 
(1.06) is called the “minimax” rule. It was devised by Abraham Wald 
who has developed this approach in a series of articles and in the book 
“Statistical Decision Functions” [4]. It is plausible practically as a 
hedge against severe loss if a “bad” value of 6 should turn up. 

It should be emphasized that the “minimax” rule is not the only one 
that might be employed. An upper limit to n may be obtained directly 
from (1.04). First find the expression for the optimum value of n as a 
function of 6 and then see how large this optimum value can be. We 
might also plot the expected cost against 5 using (1.04) directly for 
various values of n. If we had some idea of what values of 6 might be 
expected to occur in our tests then n could be chosen on the basis of the 
performance curves. 


IV. MAXIMUM EXPECTED GAIN 
In the methods so far considered we have disposed of the trouble- 


50 
le 
e 
e 
d 
8 
e 
2 


536 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


some unknown parameter 6 by choosing an ingenious rule for action. 
In the method of probability control we require the purchaser to give 
us an “economically significant” value, 59, while in the method of loss 
control we agree to deal with that value of 6, say 6’, for which the loss 
is @ maximum. 

In practical situations previous information may be available about 
the nature of 6. If, for example, it were known that 6 would seldom be as 
large as 5 or 4’, it is pointless to concentrate our attention on the be- 
havior of the selection scheme for these values of 6. In fact we might 
decide against making tests at all. 

Whether information about 6 should be incorporated into a selection 
scheme will depend on how “good” this information is. 

One method of bringing information about 6 into the picture is to 
change our point of view and treat 6 as a second random variable in the 
problem. 

Imagine that we had data on tests of A and B bulbs at other places © 
but that the resulting estimates of 6 fluctuated from place to place. We » 
might feel that it was necessary to test the bulbs under our peculiar } 
conditions of usage. We might then regard our environment as a 
random sample from a population of environments. 

On the other hand it is also possible to regard the A and B bulbs as a 
random sample from a population of different manufacturers. For 
example, if our products were two strains of wheat developed by an 
experiment station, then information might be available concerning 
the genetic variability of such strains of wheat. 

Suppose, for example, that a preliminary experiment on service-life 
had been run and bulbs manufactured by a large number (m) of differ- 
ent companies had been tested. If no bulbs of each company were tested 
then we would construct Table 1. 


TABLE 1 
ANALYSIS OF VARIANCE 


Source of Variation d.f. Mean Square E(Mean Square) 
Between brands (m—1) M, +noor* 
Within brands m(no—1) M: o 


With the situation of Table 1 in mind the mathematical model may 
be extended. The true means of the A and B bulbs, us and us, may be 
regarded as samples from the superpopulation of manufacturers. This 
may be symbolized by writing 


‘ 
; 
2 
4 
4 
| 
=. 
ty 
é 
| 
a 


TWO-CHOICE SELECTION 537 
wa = p+ Ca 
= pt ep 


where e, and ég are regarded as quantities which are normally and in- 
dependently distributed with mean zero and variance o;’. By definition 


5 = wa — ue = Ca — CB 
hence 4 is distributed normally with mean zero and variance 
os? = 

For the sake of simplicity we will make this assumption concerning 
§. It is not very realistic because it disregards the information about 
ua and uz in the first sample and also the method of choosing A and 
B for further tests. This information about the means and the first 
selection scheme can be utilized [5] but the process is not simple. 

It is convenient to deal with expected gains rather than losses when 


working with the extended model. If P4 and Pz are the respective 
probabilities of selecting A and B bulbs then 


f “dt,  Ps=1—Ps 


where 


The expected value of the bulbs selected is therefore 
NCi(uaPa + usPs) = NCw + + NCies 


and the “gain” is this quantity minus $2nC%. 

Since we are dealing with a second set of random variables (e4 and 
ez) we may take the expectation of the “gain” with respect to these 
variables. The expectation of NCieg is zero since E(es)=0. The con- 
stant terms are unaffected so the second expectation leads to 


NC,E(6P4) + NCyw — 2nCs. 
The quantity NC. will be omitted in the subsequent development 
and we will call 
G = = 2nC; 


the expected gain due to the selection scheme. If r=6/o3 then 


950 
ve 
SS 
it 
AS 
it 
n 
e 
r } 

6 
a 
Oy 


538 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


E(8P,) = 0s f 
This integral may be simplified by transforming P, to an integral with 
constant limits, reversing the order of integration, and integrating out 
the dummy variables. As a result 


NC 
(1.07) — 2nC, 
n 
where 


To find the optimum sample size for the extended problem n will be 
treated as a continuous variable. Equating 9G/dn to zero gives 


3/2 
(1.08) + 1) 


n* 
We may find n* by solving 
— — B?=0 


and using n*=\—¢. 

There is one more parameter, a», in (1.08) than in the minimax solu- 
tion. Presumably this parameter may be estimated from Table 1. 

An insight into the nature of selection schemes may be obtained by 
evaluating the expected gain, G, for different values of n and o». If 
the costs of the minimax example (Section IIT) are used we can apply 
(1.07) directly and we obtain Table 2. It is desirable to think in terms 


TABLE 2 
EXPECTED GAIN DUE TO SELECTION ($G) 


3 


SSSRES 


SSS85a/ 
SeSSage 


: 
/ 
| 
a) 1 2 5 10 
205.26 515.03 
228.41 527.93 
237.52 530.59 
234.98 525.02 
221.45 508.63 


1950 


ith 


ao 


TWO-CHOICE SELECTION 539 


of G since this single index tells us the worth of a given selection scheme 
and enables us to compare alternative schemes in a meaningful manner 
(i.e. in terms of dollars and cents). 

The range 20,510 might be termed the “working range” of this 
particular selection scheme. If o, is less than two it is questionable 
whether a second test will be worthwhile. As o, becomes large the in- 
formation in the first test (and a more elaborate analysis) should be 
utilized. 

Within this “working range” the most striking feature of Table 2 
is the stability of G within the columns. If we use any value of n over 
the wide range 10 Sn S30 instead of the optimum value n*, we shall 
lose very little (at most about $10). 

This has two important practical consequences: (1) reasonable 
approximations (such as the substitution of estimates for parameters) 
are not likely to cost us much money, and (2) any sensible rule for 
determining sample size will work about as well as the “best” scheme. 

While the above conclusions are based on the specific situation of 
Table 2, there is reason to believe [5] that they hold in many applied 
fields. 


V. COMPARISON OF APPROACHES 


Other criteria than the three methods of determining sample size 
presented here can be constructed. The methods considered here apply 
to fairly specific situations. 

The probability method (1.03) involves the least additional informa- 
tion. The only quantity required is an estimate of the experimental 
error. However, it introduces two arbitrary quantities, 59, and a. Pre- 
sumably the cost information incorporated in the other methods enters 
the probability method in the choice of these quantities. For example, if 
the sample size required by (1.03) were excessively large, the selector 
might be inclined to revise 5o, or raise a so as to obtain a more practical 
result. The advantages of the method are that it requires only o?, that 
it is computationally easy, and that in extensions to more complex 
cases it will generally be easier mathematically. The disadvantages of 
the method are that the cost information is incorporated “intuitively,” 
that it is rather arbitrary, and that it does not take advantage of prior 
information that may be available. 

The distinction between the loss control (1.06) and the expected gain 
(1.07), lies in the appearance of additional information in the latter 
and in the choice of two different rules of conduct. The loss control 
plan guarantees that the loss on any decision will not be greater than 


| 
be 
- 


540 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


a certain amount. This is especially advantageous when a single selec- 

tion is to be made. If the selection scheme is to operate routinely, then 

it may be more reasonable to want the long-run gain to be a maximum. 

As a rough analogue, consider the card game of “Twenty-one” or 
Blackjack. Some people would prefer to be players because in this way 
they have control of their losses at each stage. On the other hand, some 
would prefer to bank the game because of the long run advantage to the 
banker. 

The results of Section IV indicate that the choice of rules is not criti- 
cal and this is an interesting point. In the field of applications of sta- 
tistics the “best” has often been the enemy of the “good”. The in- 
corporation of pragmatic concepts directly into the statistical method- 
ologies may have the effect of reconciling “best” and “good”. One can 
even hope that this will lead to a reorientation of statistics and sta- 
tisticians. 

REFERENCES 

[1] Sprouls, R. Clay, Statistical Decisions by the Method of Minimum Risk: 
An Application. Journal of the American Statistical Association, 45, 1950. 

[2] Simon, Herbert A., Symmetric Tests of the Hypothesis that the Mean of 
One Normal Population Exceeds that of Another. Annals of Mathematical 
Statistics, 14, 1943. 

[3] Eisenhart, Churchill, Probability that Sample Means are in Opposite Order 

| to Population Means, Ch. 14 in Eisenhart, Hastay, and Wallis (eds.), 

“ Techniques of Statistical Analysis, McGraw-Hill, New York, 1947. 

[4] Wald, Abraham, Statistical Decision Functions, Wiley, New York, 1950. 

[5] Bross, Irwin, Unpublished doctoral dissertation, North Carolina State 
College, 1949. 


: 
| 
4 
io, 

: 
if 

ity 
: 


OPERATIONS ANALYSIS AND THE THEORY OF 
GAMES: AN ADVERTISING EXAMPLE 


LEONARD GILLMAN 


Operations Evaluation Group, 
Massachusetts Institute of Technology 


The theory of games is applied to an advertising competi- 
tion. Functions of susceptibility and resistance to advertising 
are used in a mathematical model to determine the strategy 
and spending rates of the participants in the campaign. 


HE PROBLEMS With which I am most familiar are of a military na- 

ture, and for tuis reason the advertising example I am about to 
present may not be sufficiently true to life. However it does indicate 
how an assumed model and appropriate numerical information can 
lead to intelligent recommendations for a course of action, and pre- 
diction of the results. 

We consider a competition between two businessmen, whom I shall 
call Mr. Big and Mr. Little. Mr. Big (who in this example may represent 
a coalition of several Mr. Bigs) has been prospering for many years, 
on the basis of a stable s. pply of customers. Mr. Little, on the other 
hand, is on the verge of bankruptcy; if things continue at their present 
rate he will definitely be forced out of business at a stated time in the 
future (when the bank forecloses, say). 

Now at time 0 a new potential customer has just arrived on the scene. 
He may decide not to buy at all, but if he does buy he will place a large 
order—with one of the men: he will not split the order. The order is 
sufficiently large to put Mr. Little back on his feet, but to remain in 
business Mr. Little must obtain the order by D-day —call it time 1. 

Mr. Big and Mr. Little now engage in an advertising competition. 
Mr. Little needs the new customer. Mr. Big would like to get the cus- 
tomer, but primarily he would like to see Mr. Little forced out of busi- 
ness; in other words, Mr. Big would regard it as a victory even if the 
customer should decide not to buy at all. 

So the two competitors engage in a zero-sum game: Mr. Little wins 
this game if he secures the customer within the allotted time; otherwise 
Mr. Big wins—if the customer does not buy at all, or if he adds insult 
to injury by ordering from Mr. Big. 

The customer will be influenced in his choice of where to buy, if he 
does buy, only by the advertising of these two men. The conditions 
are these. First, each allots a specified amount of money for the cam- 


541 


1950 

en 

m 
or 

ay 

ne 

he 

ti- 

n- 

d- 

an 

a- 

k: 

of 

al 

er 


542 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


paign. Secondly, each advertising schedule, covering the entire time 
period, 0 to 1, must be contracted for in advance—at time 0. Finally, 
the natures of the respective advertising messages are such that it is to 
Mr. Big’s advantage to distribute his effort over the campaign, while 
Mr. Little plans to concentrate his resources into one elaborate appeal. 

It is assumed that in each instance the advertiser knows which me- 
dium (newspaper, radio, and so on) represents his best buy at the 
moment, and that this is the medium he selects. The one variable about 
which he must make a decision, then, is time—when to advertise. 

I shall suppose that the customer’s resistance to buying at all de- 
creases with the passage of time, due to causes independent of the ad- 
vertising in question. (For instance, my kids may wangle a television 
set out of me yet.) 

Thus with each given ad which appears at a given time, there is asso- 
ciated a probability that the customer will take note of the ad and de- 
cide to buy from the advertiser, and I am supposing that this probabil- 
ity is an increasing function of time. 

Now the operations analysts who advise Mr. Big and Mr. Little 
have previously investigated many similar competitions, and have dis- 
covered a number of pertinent facts. The successive decisions that the 
customer makes have been found to be independent of one another— 
until he decides definitely to buy, and from whom; after that he will 
not change his mind. (Thus if the customer has not yet succumbed, 
then the probability that he succumb to a given ad is the same whether 
or not any advertising has appeared previously; but if he has already 
made his decision to buy from a particular man then it is entirely irrel- 
evant whether or not any further ads appear.) It has been discovered 
in addition that the customer’s reaction to the advertising at any given 
time is independent of its manner of distribution, so that, for example, 
one sixty-dollar ad is equivalent to two thirty-dollar ads. (Of course all 
this is hypothetical, not proven customer psychology.) 

The problem may now be formulated as follows. For simplicity I 
will consider the campaign as a mathematically continuous affair. Mr. 
Little is to choose an instant ¢, between 0 and 1 inclusive, at which to 
advertise. Mr. Big, on the other hand, chooses a function S(t), which is 
his rate of spending over the time interval (0, 1); S(t) is non-negative, 
and its integral from 0 to 1 is M, the total amount of money set aside 
by Mr. Big. 

For Mr. Little the fundamental quantity is a function L(t), the prob- 
ability that the customer, if still available at time ¢, will succumb to 
Mr. Little’s ad if it appears at that time. The fundamental quantity 


cor 

a 

| 

y 

Re. 

Ws. 

‘ 

= 

| 

. 

‘ 


OPERATIONS ANALYSIS 543 


for Mr. Big is described somewhat differently: it is a function B(é), 
the probability that one dollar’s worth of advertising, invested at time 
t, will secure the customer, if he is still available at that time. (More 
precisely, B(t) S(t) is the probability density associated with spending 
rate S(t).) 

The functions L(t) and B(t) may be called the susceptibility functions. 
They are assumed known to both sides (from analysis of previous simi- 
lar situations), and, as I stated earlier, these functions are assumed to 
be increasing with time. 

Mr. Little attempts to maximize—and Mr. Big to minimize—the 
probability that Mr. Little secure the customer. This probability will 
depend upon the strategies employed—upon Mr. Little’s choice of the 
time ¢t at which to advertise, and upon Mr. Big’s rate of spending, S(t): 
these are the only factors in the problem under the players’ control. 

For fixed choices of these strategies, t, and S, the probability that Mr. 
Little win the game is equal to the probability that the customer resist 
Mr. Big to time ¢, and then succumb to Mr. Little, that is, equal to 
Rs(t) L(t), where Rs(é) denotes the probability of resisting Mr. Big to 
time t. The assumptions of independence which we have made lead to 
the following determination of the resistance function: 


(1) log Rs(t) = — f $(2)B(z)de. 


Mr. Little’s optimal strategy, according to the theory of games cri- 
terion, must be such as to guarantee him a certain probability, v, at 
least—whatever his opponent does; and Mr. Big’s optimal strategy is to 
restrict Mr. Little to at most this same value v. It turns out that Mr. Lit- 
tle must adopt a mized strategy—that is, he must choose his t at random 
from a certain definite probability distribution; while Mr. Big’s optimal 
strategy is a pure one—that is, if this game were repeated Mr. Big 
would spend at the same rate in each instance. 

In fact it can be shown that Mr. Big’s optimal strategy is to spend at 
such a rate as to hold Mr. Little’s win probability constant—independ- 
ent of his choice of from as early as possible, down to time 1. This idea 
was proposed by Herbert Weiss, of Aberdeen, in examining a game of a 
similar form. Lloyd Shapley, working at RAND, had pointed out that 
Mr. Big need not consider mixed strategies whenever the resistance func- 
tion satisfies a relation of the type (1). It is then not difficult to show 
that Weiss’ conjecture is correct. 

Here is what we have, then. The susceptibility function L(é) increases 
with time. The resistance function Rg(é) is non-increasing with time— 


ne 
y, 

to 

ile 
al. 

e- 

he 

ut 

e- 

n 

l- 

le 

e 

ll 

, 

r 

y 

d 

n 

I 

) 

‘ 


544 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


whatever the spending rate S(t). Mr. Big is so to choose this rate that 
the product Rs(t) L(t) remains constant. The opening moment of Mr. 
Big’s campaign, time ¢éo, is then determined by how much money, M, 
he has available. 

The spending rate which has the required py, So(t), is equal 
to the derivative to L, divided by LB: 


L'(t) 
2 So(t) = 
(2) o(t) LOBO’ 
and from time ¢y on the resistance function will have the property that 
Lito) 
3 Rs,(t 
(3) s(t) = Lo’ 


The opening of the campaign, time fo, is then found from the equa- 
tion 


1 
(4) M= So(t)dt 
to 
(for present purposes we are supposing that a solution exists, and is 
positive). 

Mr. Big’s optimal strategy is then: don’t advertise at all until time 
to; thereafter spend at the rate S,(¢). And then Mr. Little’s maximum 
probability of success will be L(t); he obtains this probability if he ad- 
vertises at time ty or later, but the smaller number L(t) if he advertis.s 
at any tearlier than t. Of course the more money available to Mr. Big 
the earlier he can start spending, and the lower he can depress Mr. 
Little’s maximum probability of success. 

I should like to point out that the constancy principle just described, 
while perhaps appearing reasonable, may not necessarily have been 
obvious. (In fact it is wrong in case B(t) does not always increase with 
time.) Notice that it implies that the optimal rate of spending depends 
upon the logarithmic derivative of L—upon its relative rather than its 
absolute rate of change. For example, if L(é) were doubled throughout, 
the spending rate So(¢) would remain unchanged—the factor 2 would 
cancel out (equation (2))—-and ¢) would be the same as before. (How- 
ever, Mr. Little’s chance of success, L(t)), would of course be doubled.) 

Another feature of the solution which is not evident beforehand is this. 
Suppose, for example, that the susceptibility functions L(t) and B(t) 
were both proportional to ¢; then S(t) would be inversely proportional 
to é#. In other words Mr. Big would spend at a continuously decreasing 


Neg, 
pe 


at 


OPERATIONS ANALYSIS 545 


rate—something he may not have guessed, but which appears to be a 
typical circumstance. 

Mr. Little also has an optimal strategy, Its use assures him of an ex- 
pectation at least L(t), whether Mr. Big adopts strategy So or any 
other instead. This one was a little harder to work out. I should remark 
that the solution to this game was also obtained by Professor David 
Blackwell of Howard University, while working with his associates at 
RAND. 

Mr. Little’s optimal strategy will be a certain probability distribu- 
tion, the probability that he will advertise at or before time ¢. For pur- 
poses of a comparison I wish to make I will refer instead to the comple- 
mentary quantity, the probability that he will advertise later than time 
t. It turns out that this probability, P(t), is unity up to time ty (that is, 
Mr. Little, like Mr. Big, will never advertise earlier than ¢:) and there- 
after it is equal to B(to)/B(t): 

B(to) 
(5) P(t) BO (t St <1). 
(The mathematician will observe that the function P(¢) has a jump at 
t=1). 

Now the probability P(t) can also be regarded as a resistance func- 
tion: since the customer is certain to resist if no ad appears, P(t) is 
the probability that the customer resist Mr. Little because he did not 
advertise—whereas the resistance function Rs,(t) described earlier is 
the probability that the customer resist Mr. Big in spite of his adver- 
tising. Thus Mr. Little’s resistance function is a ratio in terms of Mr. 
Big’s susceptibility function (equation (5)), while Mr. Big’s resistance 
function is the corresponding ratio in terms of Mr. Little’s susceptibil- 
ity function (equation (3)). Again, I believe this relationship would 
have been difficult to anticipate in this asymmetrical problem in which 
one man’s strategy is a point while the opponent’s is a function. (Of 
course both resistance functions are based upon all the given informa- 
tion, since the determination of t) depends upon all the given quanti- 
ties.) 

We have also worked out more complicated games, such as the one 
in which both men’s advertising is distributed over time. But the pur- 
pose here has been to present a situation which is illustrative yet rela- 
tively simple. I might close with a numerical example. Suppose that 
L(t) =t/5, and B(t)=t/1000. Then if Mr. Big has $1000 to spend his 
optimal rate of spending will be So(é) = 1000/#, starting at time t=4; 
and Mr. Little’s probability of success will be L(t) =>. 


950 
at 
[r. 
| 
al 
_| 
4 


DESIGN OF EXPERIMENTS FOR MOST PRECISE SLOPE 
ESTIMATION OR LINEAR EXTRAPOLATION 


CurHBERT DANIEL 
AND 
HEEREMA 


This paper discusses the problems of slope-estimation and 
of linear extrapolation when the precision of the Y-measure- 
ments changes with different values of z. Tables are presented 
which show, for each of these problems, the optimum place- 
ment of z-values and the optimum distribution of the N ob- 
servations for selected relationships between o and z. The 
optimum placements are seen to be the same for the two prob- 
lems, but for the latter, the optimum distributions depend on 
the nearness or remoteness of the extrapolation. 


INTRODUCTION AND SUMMARY 


LTHOUGH scientific and engineering work frequently depends on 
linear extrapolation, there appears to be no statistical reference 
to extrapolation since the paper of Working and Hotelling.' Indeed the 


very word extrapolation has become almost a term of opprobrium. 
Even for the simpler problem of slope-estimation, only the homo- 
scedastic case (constant precision of Y-measurements) is regularly 
discussed and then only as a problem in estimation, not in terms of 
_planned experimentation. 

The homoscedastic case is felt by the writers to be the exception 
in scientific and engineering work. Indeed, extrapolation is generally 
necessary only in the heteroscedastic case. It is usually just because 
the precision of Y-measurement deteriorates for some range of z- 
values that the scientist is compelled to make measurements elsewhere. 
For this reason the results offered on “heteroscedastic extrapolation” 
are expected to be the most useful. 

It is often the purpose of an engineering test to determine the slope 
for a relation known from wide previous experience to be linear. Esti- 
mation of a modulus of elasticity for a new batch of metal, of a friction 
factor for a new type of packing, of the Henry’s law constant for an 
untested solute, of the rate of a first-order reaction, are common ex- 
amples. 

1 Journal of the American Statistical Association. Vol. 24, March supplement, pp. 73-85, 1929. Since 
this was written, two references have come to our attention. Both are in the volume, Techniques of Sta- 


tistical Analysis, by the Statistical Research Group, Columbia University, McGraw Hill, 1947; both are 
by M. Friedman. Article 9 is a special extrapolation. Article 11 is partly a most precise slope estimation. 


546 


ts 
| 
re 
: 
: 
ks 


SLOPE ESTIMATION 547 


Table 1 shows for slope estimation the optimum placement (of z- 
values) and the optimum distribution (of m-+m=WN statistically in- 
dependent observations) for several precision-conditions (relations 
between ¢ and x). The variance of slope, and for some cases the relative 
efficiency of the optimum arrangement compared to the more usual 
arrangements are also given. It is seen that the gains in precision may 
be considerable. 

Table 2 shows the optimum placements and distributions for the 
same precision-conditions when extrapolation (to 2.) is planned. It is 
seen that the optimum placements are the same as for slope estima- 
tion, but that the optimum distributions depend on the remoteness or 
nearness of the extrapolation, i.e. on “q”. 

The limitations of these results should be emphasized even in a 
summary. It has been necessary to assume: 


1. The relation between z and Y’ (the population value of Y) is 
known to be linear over the range 2; to 22 for slope estimation, 
and over the range 2; to z, for extrapolation. 

. The relation between z and oa is known. (The referees have 
pointed out that the same analysis applies if o,?=o7f(z,), 
where f(z;) is known and o? is unknown.) 

. It is possible to choose the z-values at which Y measurements 
are to be made. (This rules out most time series.) 

. It is possible to attain statistical independence in repeated 
measurements of Y at fixed z. 

. The z-measurements are essentially exact; only Y is subject 
to error. 

. In extrapolation experiments it is further assumed that the 
z-value (z.) to which extrapolation is planned, is known be- 
forehand. 


TERMINOLOGY 
Equal spacing refers to uniform z-spacing of a set of Y-measurements, 
with one observation at each z-value. 


Equal bunching refers to the grouping of all measurements at two z- 
values, 2; and 22, with N/2 measurements at each. 


Optimal bunching refers to the grouping of all z-values at two points, 
2, and 22 that satisfy criteria given below. These groupings are optimal 
in the sense that they minimize the variance of the estimated slope or 
of an extrapolated Y-estimate. 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


548 


to 
s('z—"2) N 
ry tu= p+x9 9 
(12 — a) / (60-410) = 
— 
I+N N 
tu pus tu jo tz pus 'x jo i= 


NOLLVWILSA AdO18 


By, 

Te 


549 


*PPON | Z = Zee 
! 


» 
0<*9 
T 
-= 
69'I~ 
S=N ‘So I~ = tus tu =r = Y+,r9 +6 
a 
3 


1950 


i 
i 
‘ 
! 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


/ ($0410) = 
b+17, I+ Ns N 
I+N 
tu pus jo tz pus jo i= 


_ 


UVANIT 
& 


: 
|| 
kis 
| 
) 
| 


551 


[ TN. 
(794 19)% 
(YG = = 10) 
T='u Zuroeds [enbe 
b+ 'o/%o 


SLOPE ESTIMATION 


552 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 
Extrapolation ratio, designated by q, is defined by the equation 

Ve — 

Le 
where 2, is the z-value at which Y’ is to be estimated. The three z- 
values stand in the relation z,>22>2, throughout the paper. Thus q 


varies from near zero for nearby extrapolations to near unity for remote 
extrapolations. 


q= 


Efficiency of one arrangement compared to another is measured by the 
inverse ratio of the variances of the quantity desired (slope or extra- 
polated Y-value) for the two arrangements. Efficiencies given are all 
greater than unity. 


SLOPE ESTIMATION 


a. Homoscedastic case. When oc is constant for all values of x, the 
variance of the slope of the least squares line is given in line 1 of Table 
1. It is obvious that the slope will be most precisely determined when 
an equal number of statistically independent observations of Y are 
made at two z-values, 2; and x2, spaced as widely as the physical situa- 
tion permits. The gain in precision over the usual equally spaced ar- 
rangement may be expressed by the ratio of the variance-of-slope for 
the two cases. This ratio is 3(N—1)/(N+1). Thus the standard error 
of slope estimated from N equally spaced points will be about 1.7 times 
that estimated from the equally bunched arrangement. The ratio of 
the number of equaliy spaced measurements required to give the same 
precision (variance) as N measurements equally bunched is also 
3(N—1)/(N+1). Thus nearly three times as many equally spaced 
measurements as equally bunched ones are required for the same pre- 
cision of slope estimation. 

b. Heteroscedastic case. When o varies with 2, it is still true that the 
slope is most precisely determined when the measurements are bunched 
at two points. However it may be that increasing the spread between 
2, and zz does not decrease the variance of slope. The criterion for 
deciding how far apart 2 and z2 may be placed is given in line 5 of 
Table 1, in terms of the rate of change of o with z. These optimum 
positions only exist when the curve of o versus z is concave upward. If 
the relation between o and z is known graphically then it is easy to 
locate x; and x2. We give two examples: in the first o increases mono- 
tonically with x, in the second there is a minimum in the o vs. x curve. 
When ¢ increases with x, (d’c/dx?>0), then 2; should be chosen at 


4 
\ 

EA 
1 

| 

“ge 

: 
4 


SLOPE ESTIMATION 553 


the smallest attainable o-value. A plot is then made of do/dz versus z. 
With the same z-scale, the quantity (o:+02)/(x2—21) is plotted. The 
point of intersection of the two curves determines o2, 2s. 

When there is a minimum in the o-z curve, a value of z less than the 
value at which o is a minimum is chosen. Then in turn the correspond- 
ing o1, (do/dz),, (do/dz)2, (one of these derivatives is always the nega- 
tive of the other for an optimum), and finally o2 are found. The quantity 
(o1+02)/(%2—21) can then be computed and plotted. This process may 
be repeated for several values of 2;. The critical value of 2; is that which 
makes the above ratio equal to the corresponding (do/dz)>. 

In some cases the critical values of x; and x2 can be given explicitly. 
Some examples of this sort are shown in lines 9 to 12 of Table 1. When 
o is parabolic in z (line 9), it is interesting to note that no measurements 
are taken at the position of minimum ¢ but rather at the two z-values for 
which o is twice its minimum. When o can be represented by one 
parabola on one side of its minimum and by another on the other side 
(line 10, Table 1) the critical values of x can still be given explicitly. 

Two examples of explicit solutions are also given when ¢ is a mono- 
tone increasing function of z. Both for parabolic and exponential in- 
crease the solutions are quite simple; in the former case (line 11) ze 
is an invariant multiple of z; and in the latter case (line 12) 2 lies a 
constant distance to the right of 2. 


LINEAR EXTRAPOLATION 


As Table 2 indicates, the optimum z-placements for extrapolation 
are identical with those for optimum slope-estimation. The distribution 
of the n;-+nz measurements is however modified. As line 5 indicates, 
the optimum distribution is now m/n2=4qo1/o2. For remote extrapola- 
tions, i.e. as g approaches unity, this ratio approaches the most-precise- 
slope requirement since the “aim” is then of controlling importance. 
For nearby extrapolations, i.e. when g=0, the precision of measurement 
of Y at z2 becomes of controlling importance and hence 7/nz ap- 
proaches zero. 

Any two-parameter function which may be transformed into a linear 
relation, e.g. Y=a e**, may be handled by these methods. It is neces- 
sary only to change the weights of the z’s to correspond. The well- 
known y?-weighting for the exponential equation when expressed in 
logarithms is an example. 

The criteria for placement and for the distribution m;/n2 are inde- 
pendent; thus for any placement 22, 2, the optimum m/n; is that 
given in the third columns of Tables 1 and 2. 


1950 
8 q 
ote 
he 
ra- 
all 
he 
dle 
en 
re 
a- 
or 
or 
es 
of 
1e 
30 
d 
e 
d 
n 
r 
of 
n 
f 
j= 

t 


554 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


The results of this paper require further generalization in several 
directions. It will be useful to have equations corresponding to those 
given above: 


a. when o* is not known exactly but must be estimated with a 
small number of degrees of freedom, 

b. when z as well as y is a random variable, 

c. when it is desired to estimate the slopes of a plane or hyper- 
plane, or to extrapolate with minimum variance of prediction, 

d. giving optimal arrangements for other curves, particularly 
for the parabola, both for parameter estimation and for ex- 
trapolation. 


The writers gratefully acknowledge the early work on this problem 
by Robert Bechhofer, and the valuable criticism of George A. Garrett. 
MATHEMATICAL APPENDIX 


The equations for the variance of slope are well-known. For the 
homoscedastic case, constant: 


= 


(1) 


N 


> (2: #)? 


where 
b=slope of the least-squares line. 
o?= variance of y at fixed z. 
o*(b) = variance of slope. 
—2)*=summation of the squares of deviations of the 
z-values from their mean, Z. 


For the heteroscedastic case, 7 a known function of z;: 
1 


= (2) 


N 
— Fw)? 
in} 


where w;= 1/o,7= weight of ith point. 


Differentiating equation (2) and setting the derivative equal to zero: 


; 
ia 
: 
| 
ig 
‘fe 
? 
: 
if 


SLOPE ESTIMATION 555 


Equation (3) gives the critical values of z in terms of the rate-of- 
change of o with z, at which data should be taken so as to minimize 
the variance of the estimated slope. There will generally be two values 
of x satisfying this equation; calling them 2; and 22, we have 


da do o1 + o2 
Differentiating equation (2) with respect to m:, the number of sta- 


tistically independent measurements to be made at 2, (n2= N—m, the 
number to be made at 22), and setting the derivative equal to zero: 


nN O71 


(5) 
Ne 


The explicit solutions given in lines 9-12 of Table 1 for certain rela- 
tions between o and z are all derived by the same methods. The func- 
tion of z given in the first column is substituted for o in the equations 
given in line 5 to give the optimum placement, distribution of observa- 
tions, and the corresponding variance of slope. 

The expression for the variance of Y,, the estimated value of Y at 
Ze, & point outside the range of the measurements, is given in line 1 of 
Table 2 for the homoscedastic case, and in line 4 when o varies with 
zx. The derivative of the latter expression with respect to 2; is set equal 
to zero. Two equations result: 


Su? 
& = or 
Le tw 
2 
Wj Le — Xw 
where 
— Fu)? 
Sw = 
and 
dw; 
dx; 


It is likely that only two roots of equations (6) will occur in the 
range of accessible z values. By direct substitution in the equation of 


vt 

al 
| 

l, 


556 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


line 4, Table 2, the variance of Y, for bunched points will be found to 
be 
+ 


- 


= (7) 


This may be differentiated with respect to m, (remembering that 
n2= N—m,), and the derivative set equal to zero. We find 


We 
(8) 
Ne Wi 


Substituting equation (8) in (6) and (7) we find the equations given 
for optimum placement and for minimum o*(Y,) in line 5, Table 2. 

The corresponding equations given in lines 7, 9, 10, 11, and 12 of 
Table 2 are found by straightforward substitution of the equations 
for o given in the first column, into the equations of line 5. The condi- 
tions in lines 2, 6 and 8 are not optimum but are given for comparison 
through the efficiencies shown in the last columns. 


4 
4 
— 
<2 
4 
a 


7) 


at 


SEQUENTIAL SAMPLING FROM FINITE LOTS 
WHEN THE PROPORTION DEFECTIVE 
IS SMALL 


J. H. Cauna* 


The purpose of this article is to study a sequential plan for 
sampling from a finite population which contains a very small 
proportion defective. A number of working formulae are de- 
veloped and suggestions offered for their efficient use in prac- 
tice. The actual operation of the plan is illustrated in a worked 
example, in which the data and results of several tests are 
recorded. 


ONSIDER a lot of N units, of which D are defective. Then, in ran- 
dom samples drawn from this lot, the sampling distribution of the 
observed number of defective units is given by the terms of a hypergeo- 
metric series. When the proportion defective, D/N, is not too small, 
estimates of the desired precision can usually be obtained by sampling 
only a small proportion of the population and in such cases, the hyper- 
geometric probabilities may be approximated satisfactorily by binomial 
probabilities, thus avoiding heavy computation. On the other hand, when 
the proportion defective is very small, a large fraction of the population 
must inevitably be sampled and the binomial approximation becomes 
quite unsatisfactory. 

This last case is the subject of this study, in which a sequential 
sampling plan is investigated without the use of the binomial approxi- 
mation. 

In what follows we shall suppose that the sequential test for accep- 
tance or rejection of the lot is equivalent to a sequential test of the 
hypothesis Ho, that D is equal to some specified value Do, against the 
alternative hypothesis H;, that D is equal to a specified value D, 
(D,> Dy). Do and D, are chosen so that for values of DS Do we prefer 
to accept the lot, while for values of D=D,, we prefer to reject the lot. 
For values of D between Dy and D,, we do not care particularly which 
decision is made. If we let d, be the number of defectives in the set of 
the first m observations, then the probability ratio Pim/Pom is defined 
by: 


* Department of Mathematical Statistics, Ontario Research Foundation. This study was conducted 
under the direction of Dr. D. B. DeLury and was supported in large part by a grant from the Special 
Committee on Applied Mathematical Statistics, National Research Council of Canada. 


557 


: 
D50 
to 
| 
3) 
n 
f 
S 


558 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


Probability of obtaining the observed sample of size m con- 
Pim taining d,, defectives when H; is true 


j 7 Probability of obtaining the observed sample of size m con- 
taining d,, defectives when Hp is true 


Writing N—D,=Q;, N—Do=Qo and imposing the restriction that D, 
and D, be small compared with N, the following two formulae give 
close approximations to log Pim/Pom. These formulae are derived in 
Appendix 1. 


log =~ =log D,!—log Do!+log (Do—dm)!—log (Di—dmn)! 


+(Qo—Q:) {log (Qo+Q:+1+2dm)+log (Qo+Qi+1—2m) 


1 
—2 log (Qo+Q+1)} 


log los D,!—log Do!+log (Do—dm)!—log (Di—dn)! 
+(Qo—Q:) {log (Qo+Q:+1+2dn—2m)—log (Qo+Q:+1)} (2) 


A possible criticism of (2) might be that m and d,, do not occur here in 
separate factors, but in the method of constructing sampling tables, 
to be suggested later, an expression such as (2), besides presenting no 
added difficulty, actually leads to slightly less calculation and gives a 
somewhat better approximation than (1). In practice (1) is sufficiently 
good provided d,, is very small, but becomes progressively cruder as 
dm increases. 

If our tolerated risks of errors of the first and second kinds are given 
by a and £ respectively, i.e. if we are willing to tolerate a risk a of re- 
jecting Ho wrongly, and a risk 8 of accepting H» wrongly, then the 
sequential probability ratio test requires that the hypothesis Hy (and 
the lot) be accepted the first time that Pin/Pom SB, that it be rejected 
the first time that Pin/Pom2A, and that inspection be continued so 
long as B< Pim/Pom <A, where A = (1—8)/a, and B=8/(1—a).' In the 
hypergeometric case, then, the three sets of inequalities that determine 
acceptance of the lot, rejection of the lot, or continuation of inspection, 
respectively, may be put in the form [using (2)]: 


log (Qo+Qi:+1+2dn—2m) Slog (Qo+Q: +1) 
log (Di—dn)!—log (Do—dn)! log Di!—log Do!—log B 


Qo—-Q Qo—-Q @) 


1 Reference 1, pp. 40-44. 


: 
if 
* 
ae, 
3 


BOS 


& 


SEQUENTIAL SAMPLING 559 


log (Qo+Qi+1+2dn—2m) =log (Qo+Q:+1) 
(Di—dn)!—log (Do—dn)!_ log Di!—log Do!—log A 


+ 4 
Qo-Qi Qo-Q 
log (Qo+Q:+1) 
log (Di—dm)!—log (Do—dm)! log Di!—log Do!—log B 
Qo-Q 


<log (Qo+Qi+1+2d,—2m) <log (Qo+Q:+1) 
log (D:i—d»)!—log (Do—dm)! D,!—log Do!—log (5) 
Qo-Q Qo-Q 


Similar inequalities can also be obtained by using (1) instead of (2). The 
amount of computation involved in any particular case is not excessive, 
although it does depend on the magnitude of Do. An efficient way of 
handling it appears to be as follows. For specified values of N, Do, D,, 
a and £, assign to d, in turn the values 0, 1,2... , and calculate for 
each d, the minimum value of m satisfying (3) and the maximum value 
of m satisfying (4). For any dn, these m’s are, respectively, the mini- 
mum cumulative sample size at which acceptance of the lot is possible 
and the maximum cumulative sample size at which rejection is possible. 
Appendix 2 contains an illustration of the construction and use of such 
a table. 

We next study the effect on the maximum m for which, given dn, 
we would reject and on the minimum m for which we would accept the 
lot, of changing N while keeping Do, D,, A, B, and d,, fixed. For any 
particular value of N we have the following inequality holding for the 
rejection of the lot [from (4)]: 


log (2N — Dp — Di — 2m) = log (2N — Do — Di +1) —M, 


where M does not depend on N. Assuming N to be large as compared 
with D, and D,, and therefore with d,, we have approximately 


2N —2m210-“2N, or 


This suggests that if we change N by some multiplicative factor the 
effect will be to change the rejection m for any fixed d, by essentially 
the same factor. The same considerations apply in the case of the ac- 
ceptance m’s. In other words, then, the indications are that, for fixed 
Do, Di, A, B, and dm, the m’s of the two tables bear practically the same 
ratio to each other as do the N’s of the two tables, and hence from any 


1950 

) 

Dy 

ive 

in 


a 

66° 96°65 00°9 66°F 129° 1% 684‘ ST 9 

6% 00°¢ 66° £26'ST £06‘ OT 818‘ zeo 

£0°08 18°62 00°s 16°F 9T 60I'T os¢ £22 

Zz 11°08 -- £08‘ IT 996‘ T 268 

eousjdecoy | | coujdeocoy| uoroofey | 

410} 410} 4105 10} 10} 
(g) (2) (z) mw w w | | w | w | ON Pedros 

(2) (9) (2) (9) (9) (5) (2) 

000'08= 000'9= N 000° T= 

< 

AONVLdHOOV GNV WAWININ GNV WOWIXVW GNV LOT 

Vv 

: 

< 


1950 
we 
3 
3 
a 
4 


SEQUENTIAL SAMPLING 561 


one table we can immediately draw up the table corresponding to any 
other value of N. Table A, constructed for Do=6, D:=14, a=B=.10, 
and N=1,000, 5,000, 30,000 respectively, shows the extent to which 
this approximate proportionality holds. 

The tables are readily adaptable to inspection in groups, where the 
group size would in practice be decided upon from considerations of 
expediency. For any group size n, the acceptance m’s could be rounded 
up to the next multiple of n, while the rejection m’s could be rounded 
down to the next multiple of n, with both systems of rounding tending 
to decrease the risks a and 8 of wrong decisions. If inspection in groups 
should be decided upon, then the sampling tables could advantageously 
be remodelled along the same lines? as those for sequential sampling 
from a binomial population. This is illustrated in Appendix 2. 

In conclusion, we might remark that the form of the denominator of 
the probability ratio Pin/Pom’ does not permit values of d, greater 
than Do (since (Do—dm)! with Do—dx<0O has no meaning). Indeed, 
when d,,2= Do, the nature of the problem has changed, since there is no 
longer any risk of rejecting a lot containing fewer than Do defectives. 
New criteria to serve as a basis for continuing the sequential sampling 
process could be formulated, but this is not attempted here, since the 
choice of criteria must depend essentially on the particular circum- 
stances in which the sampling plan is employed. 


APPENDIX I 
DEVELOPMENT OF USABLE APPROXIMATIONS TO THE RATIO Pim/Pom 


If we denote by H> the hypothesis that the true number of defectives 
in the lot, D, is equal to a particular value Do and by H, the hypothesis 
that D is equal to D, (Di> Dp), then we have 


Probability of obtaining the observed sample of size m contain- 
Pim _ ing d, defectives when H, is true 


Pom Probability of obtaining the observed sample of size m contain- 
ing d, defectives when Hp is true 


D,(D,--1)(Di—2) + + 
(N—D:—m—dn—1) 

N(N—1) -- -(N-—m-+1) 
(N—Do—m—dn—1) 

N(N-1) (N+m+1) 


2 Reference 1, pp. 92-93. 
§ Appendix 1. 


562 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


If we write N—D,=Q:, N—Do=Qo, and observe that Qo>Q,, then the 
above becomes 


Pim D,\(Do — dm)! 
Pom Do!(Di — dn)! 


[“= — dm)(Qo— m — dm —1) ++ 
Qo(Qo — 1) --- (G+ 1) 

Multiplying this ratio by the unity factor 
(Qo + dm)(Qo + dm — 1) (Qi + dm + 1) 
(Qo + dm)(Qo + dm — 1) +++ (Qi + dm + 1) 


(6) 


we obtain 


If we now impose the restrictions that N be large and that Dy and D, 
be relatively small, so that Qo and Q, are also large, then the product 
of the Qo—-Q: (=D,—Do) factors 


is very nearly equal to 
m\ 
A 


where A =arithmetic mean of the (Qo—Q,) consecutive integers from 
Qi+dm+1 to Qo+d,» inclusive. This yields the very close approximation 


(3 


If it should be desirable to have m and d,, ultimately in distinct factors, 
then from (7) we obtain the further approximation 


Se 
4 
| 
pa 
q 
is. 
: 


6) 


SEQUENTIAL SAMPLING 563 


which is sufficiently good in practice provided d,, is very small. By the 
same process as above we have also 


Using approximations (7a) and (8) and taking logarithms we obtain 


(7a) 


log 5 = log Di! — log Do! + log (Do — dm)! — log (Di — dm)! 
+ (Qo — Q1) {log (Qo + Qi + 1 + 2d,) 
+ log (Qo+ Qi 


whereas approximations (7) and (8) yield 


log — = log Di! — log Do! + log (Do — dm)! — log (D; — dm)! 
Pom (2) 


+(Qo—Q:) {log (Qo+Qi+1+2dn—2m)—log (Qo+Q:+1) } 
We note in passing that, so far as the derivation of (2) itself is con- 


cerned, this could have been accomplished directly from (6) by replac- 
ing the ratio 


by the approximate ratio 
2 +1 + 2dn — 
+Qi+ (Qo + Qi + 
2 


ie., by the ratio of the (Q.—Q:)" powers of the arithmetic means in 
the numerator and the denominator. 


| 


564 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


APPENDIX 2 
A WORKED EXAMPLE 
If we take N=7,500 sothat Qo) = N — Do =7,479, 


Do = 21 Q = N — D, = 7,471, 
D, = 29 A= = 9, 
a 
B 1 
= = .10 B= =) 
l-—a 9 


the inequalities (3), (4), and (5) become, respectively: 
log (29—dm)!—log (21—dn)! 


log (14,951 +2dn—2m) $2.65061++ (3)' 
21— 
log (14,951 -+2d,—2m) = 2,88918-4°8 
log (29—dx)!—log (21— 
2.65061 + (29-4)! tog (14,951 -+2dm—2m) 
view 


So long as (5)’ is satisfied by the successive observed pairs of values of 
m and d», we continue sampling, but the first time it is not satisfied, 
sampling terminates, and we accept the lot if (3)’ is satisfied at the final 
stage, while we reject if (4)’ is satisfied. (3)’ and (4)’ are used to draw 
up the following sampling table, and the only tables needed in the 
computation are five-place logarithms and a table of logarithms of 
factorials. We assign to d,, in succession the values 0, 1, 2,... 21 and 
calculate for each value the corresponding maximum value of m satisfy- 
ing (4)’ and corresponding minimum value of m satisfying (3)’. These 
m’s appear below in the columns headed, respectively: “Corresponding 
Maximum m Possible for Rejection” (corresponding to any particular 
dm), and “Corresponding Minimum m Possible for Acceptance.” The 
values of d,, terminate at d, = 21 because (21—d,,)! has no meaning for 
dm >21; for dn=21, (21—d,)!=1. 

In using such a table, we need to record the cumulative sample size 
m at which each successive defective was reached. For each value of dm 


A, 
|: 
: 
- | 


3)’ 


SEQUENTIAL SAMPLING 565 


reached, we compare the cumulative sample size m at that stage with 
the two values of m on the same row as d,, in Columns (2) and (3). If 
our m is less than the maximum m prescribed for rejection we reject the 
lot, while if it exceeds the minimum m for acceptance, we accept the 
lot, If our m lies between these two values, we continue sampling until 
one of two things happens: either we attain the minimum m for accept- 
ance for that value of d, without finding another defective, in which 
case we accept the lot; or we find another defective before attaining 


TABLE I 


CRITERIA FOR ACCEPTANCE OR REJECTION WHEN 
N=7,500, a =8=.10, De=21, Di=29 


(1) (2) (3) 
Accumulated Observed Corresponding Maximum Corresponding Minimum 
Number of Defectives m m 
du Possible for Rejection Possible for Acceptance 
0 1,796 
1 _ 2,021 
2 _ 2,247 
3 _ 2,473 
4 _ 2,699 
5 _ 2,925 
6 _ 3,151 
7 371 3,377 
8 762 3,604 
9 1,154 3,831 
10 1,547 4,058 
ll 1,940 4,285 
12 2,334 4,513 
13 2,729 4,741 
14 3,125 4,970 
15 3,522 5,200 
16 3,922 5,431 
17 4,325 5,664 
18 4,732 5,900 
19 5,147 6,140 
20 5,576 6,388 
21 6,038 6,655 


the minimum m, in which case we pass to the next higher value of dn 
in the table and proceed as before. In the preceding table, the dashes 
in Column (2) indicate that there is no maximum m for rejection corre- 
sponding to d,=0, 1, 2,..., 6; i.e., the lot cannot be rejected on the 
basis of finding 6 defectives regardless of the sample size. If no defec- 
tives have been found after inspecting 1,796 units we accept the lot, or 
failing that, if only one defective has been found after 2,021 units, we 
accept the lot, etc. Going farther along, if no decision has been reached 
with the finding of the 10th defective, and we find the 11th after in- 


1950 
_| 
| 
of 
d, 
al 
1e 
d 
e 
g 
r 
e 
r 


566 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


specting more than 1,940 units, but less than 4,285 units, then we would 
accept the lot if we did not find another defective after 4,513 units in 
all, whereas we would reject the lot if we find the 12th defective after 


Number of 
Defectives 
Observed 


Above this line risk is zero 
of rejecting lot which contains 
22 - 7 fewer than Do defectives = 


Reject 


A Accept 


Al 


1000 2000 3000 4000 5000 6000 7000 7500 
Number of Units Drawn Plotted from Table I 


FIGURE 1 
CRITERIA FOR ACCEPTANCE OR REJECTION. 


inspecting less than 2,334 units. By plotting the data of Table I we 
obtain a graphical description of this sampling plan (Figure 1). 

The following results were obtained in a sequence of three sequential 
tests performed on a lot of size 7,500 which was known beforehand to 
contain 40 defectives, and for which the other parameters defining the 


5 
26 
\ 
| 
| 4 
LA 
= 
‘4 
WA 
< 


SEQUENTIAL SAMPLING 567 


test were the same as in Table I; for the sake of convenience the sam- 
pling was performed in groups of size 50. 

For sampling in groups of size 50 say, as in the above example, the 
maximum m’s for rejection could be rounded down to the next multiple 


TEST 1 
Cumulative Observed Corresponding Cumulative 
Number of Defectives Sample Size 
dn m 
1 400 
2 650 
3 700 
4 900 
6 1,000 
1,150 
8 1,300 
9 1,350 
10 1,850 
11 1,900 <1,940 Reject the lot 
TEST 2 
Cumulative Observed Corresponding Cumulative 
Number of Defectives Sample Size 
dm m 
2 150 
3 200 
5 650 
6 1,050 
7 1,150 
8 1,400 
9 1,450 
10 1,850 
11 2,250 
12 2,400 
13 2 ,900 
14 3,000 <3,125 Reject the lot 
TEST 3 
Cumulative Observed Corresponding Cumulative 
Number of Defectives Sample Size 
dm 
50 
2 100 
3 150 
4 350 
5 500 
6 550 
7 950 
& 1,000 
9 1,050 <1,154 Reject the lot 


id 

n 


568 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


of 50, i.e., to 350, 750, 1,150, 1,500, 1,900, etc., while the minimum m’s 
for acceptance could be rounded up to the next multiple of 50, i.e., to 
1,800, 2,050, 2,250, 2,500, 2,700, etc., with both systems of round- 
ing tending to decrease the risks a and @ of wrong decisions. If in- 
spection in groups should be decided upon, and the group size is not 
so small as to create too large a table, then Table I could be remodelled 
along the same lines as those for sequential sampling from a binomial 
population. For example, if a group size of 200 were decided upon, the 
remodelled table would take the form shown in Table IT. 


TABLE II 
REARRANGED FORM OF TABLE I 
No. of Groups No. of Unite Acceptance No, No. a Rejection No 
Inspected Om 7 Tm 
m dm 

1 200 a= 7 
2 400 _ 8 
3 600 ae 8 
800 9 
5 1,000 ae 9 
6 1,200 _ 10 
1,400 10 
8 1,600 ll 
9 1,800 0 11 
10 2,000 * 12 
11 2,200 1 12 
12 2,400 2 13 
13 2,600 3 13 
14 2,800 4 14 
15 3,000 5 14 
16 3,200 6 15 
17 3,400 7 15 
18 3,600 * 16 
19 3,800 8 16 
20 4,000 9 17 
21 4,200 10 17 
22 4,400 ll 18 


In converting Table I into this form, we first rounded the maximum m’s 
down to the nearest multiple of the group size (here 200) and rounded 
the minimum m’s up to the nearest multiple of the group size. By using 
Columns (1) and (3) of Table I, the acceptance number a,, correspond- 
ing to any cumulative sample size m was obtained as the maximum 
number of defectives d» permissible for acceptance. Similarly Columns 
(1) and (2) yield the rejection number r,, corresponding to any sample 


| 
|. 
. . . . 
a 
ibe 


SEQUENTIAL SAMPLING 569 


size m as the minimum number of defectives d, permissible for rejec- 
tion. If, then, at any stage of sampling the number of defectives d,, 
d- either equals or exceeds the corresponding rm, inspection terminates 
with the rejection of the lot, while if it equals or drops below the cor- 


n- 

ot responding dm, inspection terminates with the acceptance of the lot. 
d So long a8 Gm <dm<?m, sampling is continued. The starred (*) places in 
al the a, column indicate that acceptance is not possible for these cumu- 


lative sample sizes. For instance, the first (*) in the column was used to 

replace 0, which is not a valid acceptance number for 2,000 units since 

with 0 defectives we should have accepted the lot after inspecting 1,800 

units. We note finally that if no decision has been reached for d,,=21 : 
defectives (Do, in the general case) the basis of this sequential plan ii 
ceases to exist. 

REFERENCE 


[1] Wald, A., Sequential Analysis. New York, John Wiley and Sons, 1947. 


CORRECTION TO “SOME NEW ASPECTS OF THE 
APPLICATION OF MAXIMUM LIKELIHOOD TO 
THE CALCULATION OF THE DOSAGE RE- 
SPONSE CURVE” 


JEROME CORNFIELD AND NaTHAN MANTEL 


In the paper “Some New Aspects of the Application of Maximum 
Likelihood to the Calculation of the Dosage Response Curve,” (Jour- 
nal of the American Statistical Association, Volume 45, 1950, pp. 181- 
210), the computations of Table IV on page 195 should read: 


(5—N)/K rather than (5—M)/K, for designation O; 
and R+PT rather than R+PYT, for designation U. 


| 


BOOK REVIEWS 


Analysis and Design of Experiments. H. B. Mann (Professor of Mathematics, 
pg a University). New York: Dover Publications, Inc., 1950. Pp. xiv, 
195. $2.95. 


REVIEW BY ANDREW G. CuiarkK, Colorado State College of Agriculture 
and Mechanical Arts 


n this monograph Profcssor Mann treats the subject of analysis of vari- 

ance designs strictly from the point of view of the mathematician. The 
emphasis is placed upon a careful mathematical formulation of the assump- 
tions and the principles of statistical inference upon which the application 
of the statistical method is based. The formulas associated with the tests for 
the principal hypotheses incorporated in many of the designs generally 
classified as analysis of variance designs are rigorously developed. 

The author adopts a straightforward approach to his subject by first con- 
sidering the distribution of the sum of squares of a set of independent normal 
variables and then the distribution of the quotient of two such sums as the 
fundamental criterion employed in the application of analysis of variance 
tests. Next, the groundwork is laid for a later consideration of special de- 
signs by means of a treatment of the general class of linear hypotheses and 
the maximum likelihood ratio criterion. Coincidentally, there is a rigorous 
development of the formulas pertinent to the analysis of variance for first a 
one-way and then an r-way classification of a population. 

Logically, before a consideration of particular types of designs, there is 
an excellent treatment of the power of an analysis of variance test. In his 
introduction, Professor Mann expresses the hope that his book will be 
studied by practical experimenters and statisticians with resulting benefit. 
This reviewer shares in such a wish if only for the reason that the concept 
of the power of the analysis of variance test should be forcibly brought to 
the attention of the practical experimenter and that he should become as 
familiar with Tang’s tables and their use as he is with Snedecor’s table of F. 
In no other way can he determine what the F test may be expected to ac- 
complish. The experimenter may become acquainted with the specialized 
designs of the analysis of variance type, and he may become able to apply 
them and to analyze correctly the resulting data; but to what avail if he has 
no understanding of the sample size necessary to detect alternatives of speci- 
fied values? However, there is little reason to believe that the author’s hope 
that those engaged in practical experimental work will benefit from his labor 
can be realized. This book is written for the mathematician and not for the 
experimenter. There are no illustrative problems, no consideration of actual 
data; so that there is little likelihood that any popularization of the concept 


570 


> 
A 
< J 
: 
wr 
< 
= 
ax 
~ 
‘ 
% 
% 
rae. 
i 
| 
. 


atics, 


BOOK REVIEWS 571 


of the power of a test or of the use of Tang’s tables in experimental design 
will result. 

The remaining chapters beginning with VII are concerned with the special 
designs starting with the Latin squares and incomplete balanced blocks. 
Attention is directed to the manifold linear hypotheses subject to test and 
to their mathematical expression. Further, some of the combinatorial prob- 
lems associated with the construction of these designs are presented and 
solved by employing finite analytic geometries. One chapter deals briefly with 
the problem arising from non-orthogonal data. There next follow adequate 
treatments of the factorial experiments and the randomized and quasi-fac- 
torial designs. The chapter dealing with covariance is most brief; in fact, 
the analysis of variance as applied to problems of regression is slighted in this 
treatise. Tables for the F distribution and for the power function of the 
analysis of variance test are to be found at the conclusion of the book. 

In summary, Professor Mann’s excellent work is one which every mathe- 
matician in any way interested in the mathematical foundations of experi- 
mental design will find both useful and stimulating. The fundamental theo- 
rems and formulas associated with the subject are rigorously proved and 
derived with noteworthy directness. 


Industrial Experimentation (Third American Edition). K. A. Brownlee. Brook- 
lyn, New York: Chemical Publishing Co., Inc., 1949. Pp. 194. $3.50. 


Review By GeorGE W. Scuustex, Standard Oil Co., Indiana 


MM of the 43 additional pages in this edition are devoted to a welcome ex- 
pansion of the treatment of experimental design. In general the text is 
identical to the previous edition, whose merits and shortcomings have been 
discussed in earlier reviews (March 1946 and December 1946). It is unfor- 
tunate that a book addressed, as this one is, to the statistically unsophisti- 
cated reader, should contain so many mechanical faults to undermine the 
confidence of the reader who seeks guidance in a strange field. Correction of 
typographical errors, a simplified notation, and a systematic numbering of 
the tables would increase the usefulness of this work to the experimenter. 
Nevertheless, it remains one of the best statistical “cookbooks” available. 


Biometrical Genetics. K. Mather. -_ York: Dover Publications, Inc., 1949. 
Pp. ix, 158. $3.50. 


Review By James F. Crow, University of Wisconsin 


I the fifty years that have passed since the rediscovery of Mendel’s princi- 
ples the mechanism of transmission of hereditary traits has been worked 
out in great detail and practically all of this has been accomplished by the 
study of discontinuous characters. At the same time there has developed the 
science of biometrical genetics, at first considered a different and rival 


xiv, 
re 
vari- 

The 
1mp- 
ition 
s for 
rally 
con- 
‘mal 
the 
de- 
rous 
sta 
e is 

his 

be 
ept 
; to 
ac- 
zed 
has 
ype 
the 
ual 


572 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


discipline but later integrated with Mendelian genetics through the work of 
R. A. Fisher and others, which treats the inheritance of continuous variation 
and which forms the subject matter of this book. 

The major part of the book deals with statistical procedures for the analy- 
sis of variance into non-heritable and heritable components and for further 
subdivision of the latter. The principal method is that of Fisher, Immer, 
and Tedin, although the third degree statistics introduced by these authors 
are mentioned only briefly. The analysis is chiefly adapted to plant breeding 
since most of the methods depend on crosses between pure lines from which 
F,, F:, F3, and backcross measurements can be made. A short section is 
devoted to randomly mating populations and methods for determining vari- 
ance components from correlations between relatives. 

One chapter is devoted to an excellent discussion of the problems of scaling. 
That the basic biological processes might not be consistent with the scale 
of measurement used by the investigator has often been pointed out and 
various theoretically and empirically determined transformations have been 
advocated. Mather’s procedure is to choose a transformation that most 
nearly removes epistatic effects, leaving dominance to take whatever value it 
will. Criteria for choosing such a transformation are given. It may not be 
possible to find a transformation that is fully satisfactory, as would be ex- 
pected from the known complexity of gene interactions and as actual studies, 
particularly those of Powers, have shown. But it is usually possible to affect 
some simplification of the analysis. Mather rightly emphasizes that the find- 
ing of a suitable transformation does not of itself justify any theoretical con- 
clusions about the manner of gene action. 

More than the usual emphasis is placed on linkage and this is the principal 
new contribution of the book. The data from which estimates of the effect of 
linkage can be made are the heritable variances in the F; and Fs, or in suc- 
cessive backcrosses. Most previous models for continuous characters have 
been based on an assumption of a large number of factors with random 
recombination. Mather has presented a model which considers linkage and 
supposes a moderate number of “effective factors,” these being statistically 
defined chromosome blocks. Whether linkage is such an important factor 
that these methods will result in generally better prediction formulae remains 
for future research to decide. 

Although the statistical aspects of the book are excellent, many geneticists 
would question some of the statements in the introductory chapter on the 
genetical basis for continuous variation. Mather has classified genes into 
two types, oliogenes and polygenes, which are responsible for discontinuous 
and continuous characters respectively. The idea that polygenes are some- 
times located in the heterochromatin while oliogenes are not, or that they 
differ in any respect other than in the magnitude of their effect, seems a priori 
unlikely and an uneconomical hypothesis. However the validity of the book 
as a whole does not depend on the correctness of this hypothesis. 

As can be expected in a work by Mather, the book is well written and lucid. 


: 
bay 
pee 
‘ 
\ 


BOOK REVIEWS 573 


It does require some knowledge of statistics on the part of the reader, and 
anyone who expects to use these methods needs to be familiar with analysis 
of variance and matrix algebra. The methods of analysis and the need for 
large, well planned experiments are made very clear. I would prefer to have 
the book expanded to include more of the work of Wright and the application 
of this to problems of animal breeding which has been made so extensively 
by Lush and others. Only part of the field of biometrical genetics has been 
covered, but that part is done well. 


Giant Brains or Machines That Think. Edmund C. Berkeley. New York: John 
Wiley and Sons, Inc., 1949. Pp. xvi, 270. $4.00. 


Review By N. Metropouis, Los Alamos Scientific Laboratory 


yy author has succeeded in presenting a quite readable survey of auto- 
matic computers as of the Class of 1948. The intended simplicity of 
language in the discussion of technical matters makes the book didactically 
good. It should prove very useful for the applied mathematician who is con- 
fronted with a problem for which numerical procedure is the only recourse, 
and who is wondering what automatic computers are available, what is their 
speed and flexibility of operation, their capacity for storage of numerical 
data, and the preparation required to put a given problem in a tractable form 
for computer consumption. 

There is first a preliminary discussion of the desirable properties of a useful 
computer, followed by an instructive example on how to build a very simple 
model. Punch-card (or tape) techniques are then described. These have 
played a central role in the development of most of the large digital com- 
puters. In succeeding chapters pertinent details are given of (1) standard 
International Business Machines (IBM), (2) Harvard’s IBM Relay Calcu- 
lator, (3) Eniac, the first electronic computer, built by the University of 
Pennsylvania staff for Aberdeen Proving Ground, (4) Bell Laboratories Re- 
lay Calculator. 

Except for the Eniac, the other calculators discussed are natural exten- 
sions of the developments associated with the smaller, less flexible, electro- 
mechanical relay machines of the conventional IBM-type. It was the Eniac 
that first showed that the electronic tube as an “on-off” device was as reliable 
as the mechanical relay and, because of the inherent speed of electronic 
equipment, could perform operations many hundreds of times faster. 

The discussion is, in the main, devoted to digital calculators, and perhaps 
rightly so, since they are likely to play an important role in scientific com- 
putation. There is, however, also a chapter on analogue calculators that deals 
primarily with the differential analyzer at the Massachusetts Institute of 
Technology. The section on the Kalin-Burkhart Logical-Truth Calculator 
should prove particularly interesting to the “purer” mathematician. Al- 
though much ground is covered, the reader does not get entangled in a maze 


1950 
k of 
tion 
aly- 
her 
ner, 
Ors 
ing 
ich 
1 is 
ari- 
ng. 
ale 
nd 
en 
ost 
it 
be 
es, 
ct 
d- 
n- | 
al 
of 
C- 
ve 
m 
id 
ly | 
or 
is 
ts 

y 

‘i 

k 


574 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


of detail. In fact, the treatment of the several “giants” is remarkably concise 
and generally relates to considerations that should stimulate greater interest 
in the use of such calculators. A very complete set of references is included as 
a supplement. 

Berkeley has exercised considerable care in describing the functional opera- 
tion of several large calculators now in existence. He has, perhaps, not given 
a proper perspective of the present state of the art. The calculators he 
describes represent first approximations to machines of really high speed, 
great flexibility, large storage capacity, and, at the same time, relatively 
small physical size. The latter are almost here. These will be available as lab- 
oratory instruments; as such they should provide stimulation in many fields 
of scientific endeavor. 

The book is the first of its kind; its account is lucid, and its publication 
is a timely one. 


Machine Computation of Elementary Statistics. Katherine Pease (Instructor of 
Psychology, Barnard College, Columbia University). New York: Chartwell 
House, Inc. (280 Madison Ave.), 1949. Pp. xii, 208. $2.75. 


Review By Atsert H. Bowker, Stanford University 


2 to the preface, “This manual is for students learning to use 
computing machines in connection with courses in elementary statistical 
methods. It is set up to be self-teaching, so that the student, by following the 
procedures in sequence, may learn to use the machines with a minimum 
amount of help from the instructor.” A survey indicating widespread inter- 
est in such a manual prompted its preparation. 

Each chapter beyond the introductory ones is devoted to a particular 
arithmetical operation including addition and subtraction, multiplication (in- 
cluding accumulative and by a constant), division, square root (both by 
subtraction of successive odd numbers and by successive approximation), 
calculation of mean and standard deviation, calculation of the correlation co- 
efficient, and calculation of percentiles and standard scores. Within each 
chapter separate sections are devoted to detailed instructions for using 
various models of Friden, Marchant and Monroe machines, including those 
likely to be extant in university computing rooms. A number of problems, the 
same set for each type of machine, follow these instructions. An answer key 
is provided. 

The book is essentially a collection of sets of instructions of which the 
following from page 52 is typical: 

A. MULTIPLICATION OF WHOLE NUMBERS 
Friden 
EXAMPLES: (a) 246 X38 =9,348 
(b) 6,789 X5,309 = 36,042,801 


1. Depress ADD key. 
2. Set 246 in the right of the keyboard. 


ay 
¥ 
ne 
+ 
4 
; 
“| 
| 
> 


BOOK REVIEWS 575 


3. Enter 38 in the multiplier unit by touching the keys of the unit in the 
order in which they are read, i.e., from left to right, 3 and 8. Verify the entry by 
reading the proof register above the multiplier keys. 

4, Touch the multiply (MULT) key. The carriage automatically clears and 
returns to position one; the multiplication takes place, the keyboard and multi- 
plier clear, and the product 9,348 appears in the upper dials. The multiplier 38 
appears in the lower dials. 

5. Without clearing the dials or returning the carriage to position, set the 
second multiplicand 6,789 on the right of the keyboard and enter 5,309 in the 
multiplier unit. Note that the O must be entered. 

6. Touch the MULT key. The carriage automatically clears and returns to 
position one; the multiplication takes place, the keyboard and multiplier clear, 
and the product 36,042,801 appears in the upper dials. 

7. In the event of an error in entering a number in the multiplier, depress the 
multiplier correction (MULT CORR) key. This clears the multiplier without 
affecting the dial entries. If the add key is down, the keyboard also will clear; 
if up, the keyboard is unaffected by the MULT CORR operation. 


PROBLEMS: Multiplication of whole numbers. Check by multiplying again, re- 
versing the multipliers and multiplicands, 


(1) 234X506 = 
(2) 1,290 X1,060 = 


The instructions are clear and correct; a student following them and doing 
the practice problems would learn how to multiply. On the whole the book 
accomplishes its objective, although there is some danger that students may 
not be able to apply methods learned through such explicit instructions to a 
variety of problems. 

In teaching the use of machines, it seems to me that there are many ad- 
vantages to an informal demonstration by the instructor to a small group of 
students, followed by supervised practice problems and individual instruc- 
tions where necessary. While following a few pages of the sort quoted above 
is easy enough, students may find following page after page of such printed 
instructions tedious and might learn more readily from an actual demonstra- 
stration. Moreover, the instructor can be a little more flexible in his di- 
rections—addition and multiplication may be done in any part of the 
machine, etc. Whenever resources permit, I favor more instructional aid than 
this book apparently envisages. 

The format of the book is excellent; it has a spiral ring binding, lies flat, 
and is convenient to use beside a calculating machine. 


Workbook for Business and Economic Statistics. George H. Haines. Dubuque, 
Iowa: Wm. C. Brown Company, 1949. Pp. 118. $2.25. 


Review BY Donatp W. Papen, University of Illinois 


HE instructor who has been using any of the standard problem manuals in 
"anesiaien with courses in business or economic statistics will find a cer- 


st 
iS 
n 
e 
y 
n 
f 
l 
| 


576 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


tain amount of variety in Workbook for Business and Economic Statistics. 
The Workbook should be suitable for use with a number of different texts— 
11 are cross-referenced. Although the author has not made extensive use of 
the technique of partially completed problems, most of the questions are 
reasonably simple, straightforward, some even stereotyped. Graph paper, 
answer sheets, and problem forms are provided. 

In the opinion of the reviewer the most serious shortcoming of the Work- 
book is the fact that it is designed to test the ability of students to duplicate 
what has been learned via text or lecture rather than to stimulate the 
student to independent thinking. The treatment of correlation is fairly typi- 
cal, where 7 out of 11 questions are of the “compute, construct, or determine” 
variety. In this respect, however, it does not differ greatly from other com- 
peting problem manuals. 


On the Accuracy of Zconomic Observations. Oscar Morgenstern. Princeton, N. J.: 
Princeton University Press, 1950. Pp. ix, 101. $2.00. 


Review By Simon Kuznets, University of Pennsylvania 


His brief monograph, which should more properly be entitled “On the 

Inaccuracy of Economic Observations,” is a stimulating discussion of a 
perennial problem in economic statistics—the disturbing variety and shock- 
ingly large size of errors in our basic data. The author’s interest in the sub- 
ject was aroused when he considered the data available for a study of the 
structure of the American economy by means of input-output tables. The 
question as to whether the data warranted the extensive calculations to 
which they would have to be subjected in the preparation of even a simple 
set of these cross-section tables led Professor Morgenstern towards a wider 
review of the problem—with the useful if only preliminary results sum- 
marized in the monograph. 

The essence of Part I, General Considerations, is the section on character- 
istics of sources and errors in economic statistics—covering the full circle 
from the general failure to obtain data from designed experiments, through 
the ignorance of respondents or their withholding of information, to the 
failures of observers (most of them untrained), lack of definitions and 
clear-cut classifications, vagueness of time units, and back to the non- 
reproducibility of the historical events covered. Two, more interesting 
though briefer, sections deal with errors in accounting and with the specious 
accuracy customary in publications in the field, where the number of digits 
far exceeds any reasonable need and differences are often stressed that 
could not be significant given the crudity of the underlying data. 

Part II presents a survey of errors revealed by comparing either primary 
statistics or synthetic estimates relating presumably to one and the same 
object (with or without minor sources of genuine differences); or errors in- 
dicated by the estimators themselves in an attempt to qualify their results. 
The errors and discrepancies in the statistics cited from fields of foreign 


3 
pees 
AN 7 
| 


BOOK REVIEWS 577 


trade, agricultural output, mineral output, national income, employment 
and unemployment, and prices, are large indeed; and may come as an 
unpleasant surprise to many, even statistically sophisticated scholars, who, 
however mindful of the weaknesses of data in the field of their specialty, tend 
to underestimate or overlook them in others. A useful bibliography is ap- 
pended. 

With Professor Morgenstern’s main conclusions: that more attention to 
inaccuracy of underlying statistics is needed; that rules followed, procedures 
used, and possible errors in the data be stated more fully and explicitly by 
compilers and publishers of both primary data and estimates; that a more 
critical attitude be adopted by users, with a greater effort toward internal 
checking of data—with all of these the reviewer is in wholehearted agree- 
ment. One can only hope that this monograph will be read by economists and 
statisticians of every persuasion. Their sharing with the author in the “shock 
of recognition” may have a healthy effect upon the practice and development 
of our discipline. 

But the reviewer would like to go a bit further and reflect upon a few 
points raised by Professor Morgenstern—largely because these reflections 
suggest somewhat different emphasis. 

1) Economic statistics, in an overwhelming proportion of cases, are prod- 
ucts or by-products of changing social institutions and relate to changing 
historical reality. Errors in such data are, therefore, complex and largely 
unique historical phenomena—which is but another way of saying that we 
are not dealing here with the results of designed, controlled experiments. 
Added to this situation is the weak position of the analytically-minded 
scholar-user vis-d-vis the large, bureaucratic producer of the data. Perhaps 
Professor Morgenstern should not have been surprised to find that less atten- 
tion is paid to errors in economic and social statistics than to those in data 
in the natural sciences. Lack of such attention may have been due less to a 
failure to recognize the existence of errors than to a feeling of helplessness 
and a realization of the difficulties involved in dealing with them effectively. 

2) Economic statistics are conventions, as in fact most measurements are: 
they reflect the magnitude of what people agree to recognize as profit, price, 
sale, etc. rather than what these phenomena may be as defined under 
imaginatively controlled conditions. In so far as economic statistics are 

ecords of institutional rather than experimental conventions, they are 
necessarily defective as measures of controlled concepts (although it is ques- 
tionable whether one should call this “error”). But the data may have value 
as records of conventions. One may be interested in the series on “corporate 
profits” not only because it is an approximation (full of errors) to what profits 


truly are as defined by economists, but also because it reflects changes in 


what the business community sees as corporate profits, and to which it 
responds. 

3) The “specious” accuracy of many economic statistics lies in their 
character as conventions—particularly in the sphere of accounting. The pro- 


0 
f 
e 
? 


578 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


verbial search for the penny of discrepancy is due to the acceptance of a 
conventional classification as a guide, often in full recognition of the arti- 
ficiality of the convention. To permit any rounding off in these situations 
would, in absence of accepted and highly articulated limiting criteria, open 
the way to uncertainty and flabbiness in the convention, which would thus 
lose one of its fundamental values—that of rigidity. A card game would cer- 
tainly be spoiled were one to permit “minor” fluctuations in the value of 
cards or tricks. Professor Morgenstern’s comment on the speciousness of 
citing the public debt of the United States to the last penny is valid if the 
series is conceived as data for analysis; it is invalid if the series is viewed, as 
it is, as a convention the value of which lies in complete accountability within 
rigid rules. 

4) In most cases, either in research or in policy making, the investigator 
relies on a variety of data—rather than on a single series, no matter how 
broad or synthetic the coverage. E.g., judgments as to direction or even 
magnitude of change in an economy, an industry, a region, either in the 
short or the long run, usually rest upon a whole group of series. Professor 
Morgenstern’s discussion tends to be almost exclusively in terms of errors of 
single items or series. It leaves an unwary reader with the puzzling question 
as to how otherwise intelligent and honest men who work in the field can be 
so naive or unscrupulous as to use data subject to such errors or to have 
tolerated the scandalous situation for so long. The answer lies in the fact 
that those who do their task with caution and thorough preparation, attempt 
to rely on consensus of various data, and even so qualify their conclu- 
sions. This is not to suggest that a judgment based on consensus is not in 
turn subject to possible errors, or that Professor Morgenstern’s call for more 
examination and checking is not fully warranted. But one is justified in 
arguing that in such consensus judgments at least some of the most eg- 
gregious errors affecting single items are not likely to be given much weight. 

5) Raw materials for this type of usage can be provided by data relating 
to small areas as well as to large ones, long series as well as short series. One 
important explanation of the tendency towards a vast amount of detail and 
“specious” accuracy lies in the hope of the producers of data that all detail 
will be grist to the mill. And while Professor Morgenstern is right in censur- 
ing the habit of government agencies of providing data to an excessive 
number of digits, any attempt to economize by reducing their number must 
take cognizance of the necessity of preserving detail: for small areas as well as 
for large; for the early as well as for the later segments of a long time series. 

6) Because errors in the economic data are essentially complex historical 
phenomena, the fundamental difficulty is bound to remain. Methods of free- 
ing data from their historical conditions are exceedingly limited; and reduc- 
ing the difficulty by means of an empirically founded theory that would 
yield constants (like those in the natural sciences) is not likely in the face of 
rapidly changing historical reality. We must, therefore, be prepared for a 
situation in which economic and social statistics will continue to be affected 


a4 
Ag 
PERS 
4 
x 
on 
¥ 
“ 
4 


BOOK REVIEWS 579 


by large errors; and in which quantification of such errors will remain a 
difficult, and often impossible, job of laborious ad hoc detection. 

Nothing is these comments detracts from the force of Professor Morgen- 
stern’s emphasis on need for greater attention to the problem. But they 
should serve to indicate a dissent from Professor Morgenstern’s tendency in 
his discussion to set up the natural sciences as a feasible ideal; to understate 
the institutionally changing elements in economic statistics; to dwell pri- 
marily upon errors in single items or series, thus overlooking the variety of 
practices used in dealing with the data, either in scientific research or in 
policy formulation. The reader might too easily conclude from Professor 
Morgenstern’s discussion that the trouble lies largely in the lack of attention 
paid by economists and statisticians to the problem; and that increased at- 
tention would go far towards solving the problem, as it has been solved in 
the natural sciences. To such a conclusion the reviewer, and most likely also 
the author, would enter a strung objection. 

One may again express the hope that this monograph will find a wide 
reading public, and that the author will pursue his explorations further. Such 
explorations will undoubtedly lead him to deal with the factors that deter- 
mine the production of economic data, and into an analysis of the waysin 
which the data, with all their errors and biases, are in fact used. Both of these 
directions of research deserve more than the scanty attention that has been 
paid them so far. 


Statistical Indicators of Cyclical Revivals and Recessions. Geoffrey H. Moore 
(Associate Director of Research, National Bureau of Economic Research, Inc.). 
Occasional Paper 31, New York: National Bureau of Economic Research, Inc. 
1950. Pp. 95. $1.50. 


Review BY WatteR E. Hoap.ey, Jr., Armstrong Cork Company 


USINESS forecasting, it is commonly agreed, is a very hazardous under- 
taking. Yet, there is no real escape in business or elsewhere from the 
need to forecast. The principal question is not, Shall forecasts be made? but 
rather, How can forecasts be made more satisfactorily? Consequently, any 
study which explores some of the vast uncharted sea of timing and ampli- 
tude relationships among statistical indicators, and suggests new or improved 
methods for anticipating cyclical changes in economic activity is to be wel- 
comed. Dr. Moore’s comparatively brief but nonetheless intensive mono- 
graph meets these conditions, and probably offers greater possibilities for 
useful application to direct business forecasting than any other publication 
of the National Bureau of Economic Research (NBER) to date in this field. 
In accord with common NBER practice, the emphasis in this study is al- 
most exclusively upon statistical refinement of collected economic series. 
Analysis of causation of observed tendencies and relationships is explicitly 
excluded. Few organizations other than the NBER could have undertaken 
the vast statistical work underlying this study, which is still considered to be 


i- 

th. 

us 

T- 

of 

of 

n 


580 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


in a tentative stage. The immediate aim is to revise and bring up to date 
“Statistical Indicators of Cyclical Revivals” (NBER Bulletin 69), written 
by Mitchell and Burns in 1938, and to extend the analysis to cover recessions 
as well as revivals. 

Business cycle analysts will find added precision given to several signifi- 
cant economic series relationships which they may have worked out previ- 
ously on only a limited basis, if at all. The principal specific conclusions 
reached by Dr. Moore are: (1) economic processes, as represented by monthly 
and quarterly time series, differ widely in the timing of their fluctuations 
during business cycles; and (2) it is possible by objective statistical methods 
to select series which may have useful characteristics as leaders, coinciders, 
or laggers in anticipating and identifying cyclical revivals and recessions. 
Other noteworthy findings include: “it would be unwise to place sole reli- 
ance on a single indicator (p. 45)”—a not uncommon practice; and “it is 
evidently easier to find advance indicators of revivals than recessions (p. 
37)”—almost invariably the most needed is the most difficult to obtain. 

Methodologically the monograph is linked strongly to the “reference” 
cycles established by the NBER staff as representing their best statistical 
judgment of the exact pattern of general business cycles observed in the 
past. Specific series are rated as to their value as indicators of revivals and 
recessions on the basis of two criteria: (1) the consistency with which their 
movements have conformed to business cycles, and (2) the consistency with 
which their turning points have led, lagged, or roughly coincided with the 
reference dates. 

From the NBER collection of 801 economic series, 21 (the same number 
as Mitchell and Burns used, but including seven new series) have been tenta- 
tively selected as the most promising indicators of over-all economic revivals 
and recessions. The eight leader series are: business failures, common stock 
prices, new orders in durable goods industries, residential building contracts, 
commercial and industrial building contracts, average hours worked per 
week, new incorporations, and basic commodity price index. The eight coin- 
ciders are: nonagricultural employment, unemployment, corporate profits, 
bank debits, freight car loadings, industrial production, gross national 
product, and wholesale prices, excluding farm products and foods. The five 
lagging series are: personal income, retail sales, consumer instalment debt, 
bank rates on business loans, and manufacturers’ inventories. 

In Appendix A is to be found what may prove to be a highly useful sug- 
gestion for identifying a change in general business. The procedure is to 
record both the direction of change of key series and the number of months 
the series have been moving in that direction. “The reason for observing runs 
is that the longer the run the more likely it is to correspond in direction with 
the cyclical phase of the series.” A rather critical average duration of runs 
for 15 tested series was about 3 months, indicating any concerted nonseasonal 
movement of 3 months has generally meant a basic change in business. 

NBER research now has established a considerable quantity of statistical 


“5 
3 
4 
; 
4 
i 


BOOK REVIEWS 581 


facts covering past cycles, which should provide the basis for further analysis 
and understanding of cyclical movements in economic activity. How much 
further this particular line of investigation of statistical series movements 
and relationships can be pursued profitably however, largely in the absence 
of clear-cut attention to the “why” as well as the “where” of series, is at least 
open to question. 

The NBER probably has an unequalled store of American economic time 
series and allied statistical refinements, and no doubt will continue to sup- 
plement this valuable documentary work. Since one of the principal findings 
of NBER cycle research is that “no two business cycles are exactly alike,” 
it is to be hoped that future NBER investigations will direct increased at- 
tention to the causative phases of the cycle as well as to more definitive 
measures of its occurrence patterns. Otherwise, there is danger that once 
again those who have completed the most extensive research on key eco- 
nomic matters—but who are nonetheless unable or unwilling to direct these 
findings forcefully toward policy ends—will find themselves largely ignored 
or interpretation of their work left to less informed individuals. The populari- 
zation of Dr. Moore’s present study in the business and financial press’ is am- 
ple proof of both the strong demand for results of studies of cyclical indica- 
tors and also the readiness of many individuals to interpret cautiously-stated 
findings well beyond original intentions—but perhaps not entirely beyond 
what some of the findings actually may warrant. 


Studies in Income and Wealth, Volume Twelve. New York: National Bureau of 
Economic Research, Inc., 1950. Pp. xiv, 585. $6.00. 


REVIEW BY B. Stanford University 


HE volume presents the results of the Conference on Research in Income 

and Wealth held by the National Bureau of Economic Research in 
January 1948. It consists of an Introduction, the eleven main papers and 
the written and oral discussions of these presented at the Conference, and an 
Index. The subject is the projection and partial fulfillment of social balance 
sheets for the United States in 1929, 1939 and 1946. 

All contributors were asked to fit their results into a pro-forma balance 
sheet of 13 domestic asset items other than claims, and 4 domestic claims and 
liability items with identical columnar components for nine sectors of the 
economy. American foreign investments are, in part, set off against foreign 
investments in the United States. In a second exhibit each sector’s claims 
against each other sector are shown in a contingency table. Part I contains 
the papers and discussion dealing with the problems of measuring wealth for 
social balance sheet purposes and with the uses of the resulting exhibit and 
its data. Part II contains the papers and discussions dealing with the nine 
sectors in the columnar breakdown noted above. 


1 E.g., “How to Pin Point the Business Cycle,” Business Week, June 10, 1950, pp. 26-28. 


> 
{ 
. 


582 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


The Conference was timely and wisely motivated. By contrast with our 
ordered knowledge of national income and its components and of its distri- 
bution among final recipients we know little about our total and component 
wealth funds and less about intermediate and ultimate distributions of 
beneficial interests in such funds. This shortfall seriously hampers economic 
analysis and decisions. 

Despite its shortcomings, the volume will reward any competent reader if 
he will but “Read not to believe nof to contradict but to weigh and con- 
sider.” This is especially true of Part II; for, whatever one may think of the 
particular sectoring of the economy or of the hopes to merge the resulting 
data in a national balance sheet, the individual papers are worthy contribu- 
tions. For example, Burroughs’ showing, both at contemporary prices and 
1939 prices, of agriculture’s successive asset and liability positions at the be- 
ginning of a major depression, at the transition from partial recovery to 
the fury and disruption of a great war, at the conclusion of that war, and 
after a year of postwar readjustment, together with the correlative income 
summations reflects great credit both upon him and upon the Bureau of Agri- 
cultural Economics. 

The reviewer’s study of accountancy and economics and his experience in 
war and post-war supply services make him extremely doubtful that any con- 
solidated national balance sheet can ever be a good—much less, the best 
available—device to serve the many different uses contemplated by the Con- 
ference. Too much of our resources (e.g., personnel and know-how) are 
omitted or blurred. 

Sector balance sheets with appropriate sub-sectoring, on the other hand, 
could readily become highly useful parts of knowledge about resource funds 
and beneficial interests. Each could be designed both for the major need for 
fund knowledge of its sector and for showing the adaptability of the sector 
resources to major contingencies. In each sector the successive balance 
sheets would need, as a minimum, to be adjoined to an analysis of the 
sector’s human resources and to a summation and distribution of income. 
To serve a maximum number of uses best there could be two categories of 
sectors: a group of vertical sectors dealing with the major primary wants; 
and a cross-economy service group. The first group would include such sec- 
tors as the entire food supply service, domestic housing and house furnishing, 
education and entertainment, etc. The second would include military se- 
curity, common carriers, etc. These two groups differ markedly from one 
another in response to the impacts of depression and war. Within each group 
the several sectors respond in different degree. With national consolidation 
dropped out, the whole job does not need to be done at once; and a minor 
amount of double inclusion, if identified, is not serious. 


+ 
2 
By, 
eur, 
- 
Fe 
ae 
4 
‘ ‘ 


BOOK REVIEWS 583 
Employment and Compensation in Education. George J. Stigler (Columbia Uni- 
versity). Occasional Paper 33. New York: National Bureau of Economic Re- 
search, Inc., 1950. Pp. 77. $1.00. 


Review By F. G. Cornewu, University of Illinois 


a institutions in the United States constitute a very large 
industry employing over a million persons. The Stigler publication is a 
descriptive statistical survey, as the title suggests, of teacher employment 
and compensation. Teacher employment trends are examined on the basis of 
the most obvious contributing factors, namely, school age population, school 
attendance rates of school age population, and pupil-teacher ratios. Stigler 
reviews teacher certification, supply and demand, and teacher recruitment. 
Teacher salary variations are examined from the standpoint of community 
size, grade level, age, training, sex, and race. Similarly, an analysis is made 
of academic personnel in higher educational institutions. 

In general, the publication exhibits the type of objectivity to be expected. 
The author exercises good judgment in frequent use of classifications such 
as those for age of teachers, rank of professors, size of community, level of 
school, region of country, urban-rural, and the like which yield reasonably 
comparable statistical series. The cost of living index is used to deflate 
teachers’ salaries with full recognition of the limitations of this procedure. 

The usefulness of the publication is hampered by the fact that it is based 
in some cases on pre-World War II data. There are several references which 
the author might have found useful. For instance, a widely read series of 
reports in the field of higher education by the President’s Commission on 
Higher Education contains estimates of trends in enrollment and staffing. 
In the elementary-secondary field some use might have been made of the 
Report of the Cooperative Study of Public School Expenditures, An Inventory 
of Public School Expenditures in the United States, published by the American 
Council on Education in 1944. 

It is doubted that the author faces squarely some of the sampling problems 
of the report. For example, salary-trend analysis is made on statistics from 
land-grant institutions. One series used was based on a “sample” of 305 of the 
1700 higher educational institutions published in 1927, long before sensible 
sampling ideas had been applied to educational statistics. Some causal rela- 
tion statements are made on tenuous grounds. A .56 correlation among the 48 
states in 1940 between the percentage of unemployed persons and the per- 
centage of school age children enrolled in schools, for instance, is used as sup- 
port for the conclusion that children stay in school when employment oppor- 
tunities are restricted and business conditions are depressed. Differences 
among the states in teachers’ salaries and in length of school term are at- 
tributed to the wealth of states. 

It is not clear whom the publication was intended to serve. The analyses on 
a national basis may be informative to some, but it is not clear to this re- 
viewer just what purposes were held in mind by the author. 


950 
yur 
iri- 
nt 

of 
nic | 
if 
he 
ng 
u- 
id 
e- 
to | 
id | 
n 
t 
- 
3 


584 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


The Cost and Financing of Social Security. Lewis Meriam, Karl T. Schlotterbeck, 
and Mildred Maroney. Washington, D. C.: The Brookings Institution, 1950. Pp. 
ix, 193. 


Review sy H. W. Srermnsaus, The Equitable Life Assurance Society 


feos study presents estimates of the cost of social insurance benefits of the 
type proposed by H.R. 2893. While the estimates were being prepared 
H.R. 6000 was introduced which is much closer to the social security 
changes that will probably bé adopted this year than are the more radical 
proposals of H.R. 2893. However, the authors continued with an evaluation 
of H.R. 2893 because they felt that it shows the full measure of the adminis- 
tration’s conception of what is essential to the public welfare. The study first 
reviews proposed benefits, pointing out in footnotes the major differences 
between H.R. 6000 and H.R. 2893. There is a chapter on available statistics 
of low-income individuals, a chapter summarizing the mass of temporary 
and permanent veterans’ benefits, and one dealing with private pension 
plans. These reviews are done in the thorough and competent manner one has 
come to expect from Brookings Institute publications. 

The cost estimates are mostly from sources within the Social Security 
Board. With respect to Old Age, Survivors, and Permanent Disability Bene- 
fits, Actuarial Study 28 is quoted, which presents estimates under high and 
low cost assumptions. The low cost of temporary disability is based on an 
estimate by Arthur Altmeyer. The low cost of compulsory health insurance 
is based on estimates by I. S. Falk, the high cost on estimates by Elizabeth 
Wilson, who, however, uses British experience with benefits which go even 
beyond those advocated by the administration. All other estimates appear to 
be rough guesses by the authors. There are no references to the basic premises 
on which these cost projections were built. There is a need to explain in read- 
ily understandable language the reasoning leading to estimates in such 
fields as population forecasts (involving mortality, births, marriages) and 
economic cycles (involving levels of earnings, prices, living costs, employ- 
ment), but this book does not attempt to do so. However, the analysis does 
bring out the tremendous liabilities for social expenditures involved in a 
comprehensive welfare program. 

There is a brief review of methods of financing social security programs 
and the arguments are in favor of a pay-as-you-go system, but the argu- 
ments quoted are not the best that could have been picked. Particularly 
spurious is the argument that the economic consequences of a Government 
investment in Government Bonds differ from private investment in Govern- 
ment Bonds. In either event, the Government would have to face repayment 
by means of new taxation or borrowing. 

In its concluding chapters the study touches on budgetary questions on 
fiscal complications, and on political implications, and then reaches some 
conclusions as to what is a sound Social Security system and how it should 
be financed. In doing this the study returns to the controversial findings of 


; 
yen 
i 


— 


BOOK REVIEWS 585 


an earlier Brookings Institute study, Relief and Social Security (1946), in 
which Lewis Meriam suggested a universal relief system, based on need only, 
an idea which would turn the clock back some 100 years. It is perhaps 
true that substantially more than half the money for a no-means-test welfare 
system is used to pay benefits to persons who are not in any need, but on the 
other hand means-test systems actually used in some states have found as 
many as 85% of the aged population in “need,” so there may be more 
savings in theory than in practice. It is also true that promises of high social 
security benefits are easy to make now but hard to accomplish later and may 
therefore lead to inflationary repudiation. But, on the other hand, what 
would be the economic effect if organized labor were to obtain similar guaran- 
tees from American industry but under funded retirement plans and without 
contributions by labor? 

The authors have touched on many of these problems and have raised 
other questions which require much further study. In spite of some short- 
comings this book is a valuable addition to the limited literature of the 
impact of public welfare measures on our economy. 


‘ 


PUBLICATIONS RECEIVED 


Braun, Kurt. The Right to Organize and 
Its Limits. The Brookings Institution, 
Washington, D. C. 1950. $3.00. 

Burns, Arthur F. New Facts on Business 
Cycles. National Bureau of Economic Re- 
search, New York. 1950. Paper. 

Carnap, Rudolf. Logical Foundations of 
Probability. The University of Chicago 
Press, Chicago. 1950. $12.50. 

Davis, Harold E. Social Science Trends 
in Latin America. American University 
Press, Washington, D. C. Paper. 

Deming, Wm. Edwards. Some Theory of 
Sampling. John Wiley & Sons, New York. 
1950. $9.00. 

Feller, William. Introduction to Probabil- 
ity Theory and Its Applications, Volume 1. 
John Wiley & Sons, New York. 1950. $6.00. 

Freedman, Paul. The Principles of Scien- 
tific Research. Public Affairs Press, Wash- 
ington, D. C. 1950. $3.25. 

Grant, Eugene L. Principles of Engineer- 
ing Economy, Third Edition. Ronald Press 
Co. New York. 1950. $5.00. 

Hultgren, Thor. Cyclical Diversities in 
the Fortunes of Industrial Corporations. 
Occasional Paper 32, National Bureau of 
Economic Research, New York. 1950. 50 
cents. Paper. 

Tadia, Government of. Directorate of In- 
dustrial Statistics. First Census of Manu- 
factures—1946. New Delhi. 

Jeffreys, James B. The Distribution of 
Consumer Goods. Cambridge University 
Press, Cambridge. 1950. $7.50. 

Keezer, Dexter M. and Associates. Mak- 
ing Capitalism Work. Whittlesey House, 
McGraw-Hill Book Co., Inc., New York. 
1950. $3.50. 

Kimmel, Lewis H. Taxes and Economic 
Incentives. The Brookings Institution, 
Washington, D. C. 1950. $2.50. 

Klein, Lawrence R. Economic Fluctua- 


ations in the U.S., 1921-41. John Wiley & 
Sons, New York. 1950. $4.00. 

Koopmans, Tjalling C., ed. Statistical In- 
ference in Dynamic Economic Models. 
John Wiley & Sons, New York. 1950. $6.00. 

Loomis, Charles P., and J. Allan Beagle. 
Rural Social Systems. Prentice-Hall, Inc., 
New York. 1950. $6.75. 

Mund, Vernon A. Government and Busi- 
ness. Harper anc Brothers, New York. 
1950. $4.75. 

Neyman, J. A First Course in Probability 
and Statistics, Volume I. Henry Holt and 
Co., New York. 1950. $3.50. 

Studies in Business Economics, Vol. IV, 
No. I. Metropolitan Washington after 150 
Years, Its Economic Expansion. College of 
Business and Public Administration, Uni- 
versity of Maryland, College Park, Mary- 
land. 1950. Paper. 

Office of Naval Research, Human Re- 
serve Division. The Development of a Test 
for Selecting Research Personnel. Ameri- 
can Institute for Research, Pittsburgh, Pa. 
1950. Paper. 

Stouffer, Samuel A. et al. Studies in 
Social Psychology in World War II, Volume 
4: Measurement and Prediction. Princeton 
University Press, Princeton. 1950. $10.00. 

United Nations World Statistical Con- 
gress. Proceedings of the International Sta- 
tistical Conferences, September 6-18, 1947. 
Eka Press, Calcutta. Paper. 

U. S. Bureau of the Census. Catalog of 
U. S. Census Publications. U. S. Govern- 
ment Printing Office, Washington, D. C. 
1950. $1.50. 

U. S. Bureau of Labor Statistics. Pub- 
lications of the Bureau of Labor Statistics. 
1950. 

Woodworth, George Walter. The Mone- 
tary and Banking System. McGraw-Hill 
Book Company, New York. 1950. $5.00 


a 
: 
j 
7 
| 
4 
é 


= 
= 
= 
= 


Outstanding Books 


INTRODUCTION TO STATISTICAL ANALYSIS 


& By WILFRD J. Dixon and FRANK J. MASSEY, JR., University 

In- of Oregon. Ready in January 

- This unique text presents the basic concepts of statistics in a man- 

Je. ner which will show the student the generality of the application 

c., of the statistical method. Both classical and modern techniques 
are presented with emphasis on the understanding and use of the 

technique. 

ity MARKETING RESEARCH 

nd By ERNEST S. BRADFORD, Manhattan College. In press 

v, Discusses the principles and procedures employed in successful 

50 marketing research, Treats the basic elements such as how to 

of analyze products and services, how to determine the character and 

“a volume of consumer demand, and what channels to use in order 
to get the product or service to market. Emphasis throughout is 

> on securing and presenting marketing data. 

i- INTRODUCTION TO THE THEORY OF STATISTICS 

” By ALEXANDER Moon, Rand Corporation, Santa Monica, 

in Calif. 431 pages, $5.00 

. A text for standard courses in statistical theory with a calculus 

prerequisite. The author first develops the theory of probability, 

- distribution and sampling, and then proceeds to explore the two 

4 major problems of scientific inference: the estimation of quantities, 

, and the testing of hypotheses. 


- BUSINESS STATISTICS. New 3rd edition 
“ By JoHN R. RIGGLEMAN; and IRA N. FRISBEE, University of 


. California, Los Angeles. In press 

. Here is the new edition of this successful book which emphasizes 
: the actual applications of statistics to business problems. Its aim 
1 throughout is to lead the future businessman to appreciate the use- 


fulness of statistical methods and to employ them in practical 
business problems. 


Send for copies on approval 


McGRAW-HILL BOOK COMPARY, Inc. 


330 West 42nd Street New York 18, N.Y. 


Please mention the Journal of the AMERICAN STATISTICAL ASSOCIATION in writing advertisers 


GRAW- Hi 


POPULATION STUDIES 


A QUARTERLY JOURNAL OF DEMOGRAPHY 


Editor D. V. GLASS 
Vol. IV, No. 1 CONTENTS June 1950 
L. T. BADENHORST. The Future Growth of the Population of South Africa 
and its Probable Age Distribution. 
NATHAN Keyritz. The Growth of Canadian Population. 
G. W. Roberts. A Note on Mortality in Jamaica. 


S. SHAPIRO. Development of Birth Registration and Birth Statistics in the 
United States. 


STEFAN SzuLC. The Sample Census of Population in Poland, 1949. 


SVEN MosBERG. Marital Status and Family Size among Matriculated Persons 
in Sweden. 


Subscription price per volume is $5.00 net, postfree 


Published for the Population Investigation Committee by 


CAMBRIDGE UNIVERSITY PRESS 
London: 200 Euston Rd. N.W. 1. New York: 51 Madison Avenue. 


JOURNAL OF FARM ECONOMICS 
Published by THE AMERICAN FARM ECONOMIC 
ASSOCIATION 


Editor: Walter W. Wilcox 
Library of Congress, Washington, D.C. 


Volume XXXII NOVEMBER, 1950 Number 4 
The European Recovery Program ..........seseeeesseeees J. H. Richter 
The Economics of Land Classification for Datestinn waa Wallace McMartin 
The Politics of Agriculture in the United States ........ Charles M. Hardin 
Some Economic Changes in Food Manufacturing ......... Allen B. Paul 


Also Proceedings of 1950 Annual Meetings 
Published as a Supplement 


This JoURNAL, a quarterly, contains additional articles, notes, reviews of books, 
and a list of recent publications and is published in February, May, August, and 
November by the American Farm Economic Association. Yearly subscription, $5.00. 


Secretary-Treasurer: L. H. Simerl 
Department of Agricultural Economics 
University of Illinois, Urbana 


Please mention the Journal of the AMERICAN STATISTICAL ASSOCIATION in writing advertisers 


| 
| 
a 
> 


LABORATORY MANUAL FOR 
ELEMENTARY STATISTICAL METHODS 
As Applied to Business and Economic Data 
Revised Edition 
By Neiswanger, Haworth & Leavitt 


This manual is suitable for a first course in Statistics in the Depart- 
ment of Economics and Business Administration and can be used 
with any textbook in the subject. $2.60 


COST ACCOUNTING AND ANALYSIS 


By Carl T. Devine 


This book offers a broad coverage of the subject, integrating financial 
ry cost accounting, management, market analysis and economics. 
5.00 


THE MACMILLAN COMPANY 
60 Fifth Avenue, New York, NY. 


PHILOSOPHICAL LIBRARY PUBLICATIONS 


GEOGRAPHY IN THE 20TH CENTURY 

. Edited by Griffith Taylor, University of Toronto 
Twenty authors, each a specialist in his field, treat the growth, fields, techniques, aims 
and trends of geography. 


From the Contents 
Part I—EVOLUTION OF GEOG-_ Exploration of Antarctica 
RAPHY AND — Geography and the Tropics 


Geo; raphy and Regionalism 

The French and German Schools of art III—SPECIAL FIELDS OF 

Geograph GEOGRAPHY 
The West Slav Googmeiies Geography is a Practical Subject 
Environmentalism and Possibilism Geography and Empire 
Part II—THE ENVIRONMENT AS Racial 

A FACTOR The Sociological Aspects of Geography 

The of Geomorphology Urban Geography 
Geographical Aspects of Meteorology Geograph aa Aviation 
Climatic Influences The Field of the Geographical Society 
Soils and Their Geographical Significance Geography in Practice in The Federal 
Settlement of the Modern Pioneer Government, Washington 
Geography and Arctic Lands Geopolitics and Geopacifics 


56 charts, maps and illustrations with a glossary of geographical terms. Over 600 nage, 


20TH CENTURY ECONOMIC THOUGHT 

Edited by Glenn E. Hoover, Mills College 
In this stimulating book, twenty of the most perplexing economic problems of our time 
are analyzed by professional economists. Each contributor was asked to express his 
opinion freely, without —e to the opinions of the others. Although written primarily 
for the general reader, students and teachers of economics will find it invaluable. $12.00 


LABOR DICTIONARY By Paul Hubert Casselman, University of Ottawa 
Prepared to supply the need for a concise reference guide for matters concerning labor. 
Contains nearly 2500 entries, consisting of definitions of labor terms, biographical sketches 
of labor leaders, labor legislation acts and numerous othe> entries. $7.50 


PHILOSOPHICAL LIBRARY, Publishers 
15 East 40th Street, Desk 497, New York 16, N.Y. 


Expedite Shipment by Prepayment 
Special student bulk rate on 10 or more 


Please mention the Journal of the AMERICAN STATISTICAL ASSOCIATION in writing advertisers 


ECONOMETRICA 


Journal of the Econometric Society 


Contents of Vol. 18, October, 1950, include: 


N. F. Morenovse, R. H. Strorz, anp 8S. J. HORWITZ .........An Electro-Analog 
Method for Investigating Problems in Economic Dynamics: Inventory Oscillations 

Luoyp A. METZLER ..............A Multiple-Region Theory of Income and Trade 

OLAV REIERSOL ldentifiability of a Linear Relation 
between Variables which Are Subject to Error 


List of Members of the Econometric Society 
Geographical List of Members and Subscribers 
Announcements, Notes, and Memoranda 


Published Quarterly Subscription to Nonmembers: $9.00 per year 


The Econometric Society is an international society for the advancement of 
economic theory in its relation to statistics and mathematics. 


Subscriptions to Econometrica and inquiries about the work of the Society and 
the procedure in applying for membership should be addressed to William B. Simpson, 
Secretary, The Econometric Society, The University of Chicago, Chicago 37, Illinois, 
U.S.A. 


ACCEPTANCE SAMPLING 
A Sympetiom ECONOMICS 


now available—$1.50 


200,000 

ACCEPTANCE SAMPLING 
BY ATTRIBUTES Out-of-Print 
Developments Prior to 1941..... -Paul Peach Economic Books 
Wartime Developments ...... ..E. 6G, Olds 
ACCEPTANCE SAMPLING Libraries Bought 
BY VARIABLES Libraries Sold 
Acceptance Sampling by Variables, with 
Frequent Catalogues 

J. H. Curtiss 


Use of Variables in Acceptance Inspec- 
tion for Per Cent D a a 


Alien Wailis BURT FRANKLIN 


Chairman's Closure .............J. W. Tukey 170 Broadway 


AMERICAN STATISTICAL ASSOCIATION New York 7, New York 
1108 Sixteenth Street, N.W. Beekman 3-4723 
Washington 6, D.C. 


Please mention the Journal of the AMERIOAN STATISTICAL ASSOCIATION in writing advertisers 


4 

* 

— 

van 

| 


National Bureau of Economic Research— 


New Publications 


Books 


What Happens during Business Cycles: A Progress Report. 
By Wesley C. Mitchell. 304 pages. 43 tables. 7 charts. $4.00. 


Inventories and Business Cycles, with Special Reference to Manufactur- 


ers’ Inventories. By Moses Abramovitz. 672 pages. 116 tables. 102 
charts. $6.00. 

Impact of Government on Real Estate Finance in the United States. 
By Miles L. Colean. 190 pages. $2.50. 

Urban Mortgage Lending by Life Insurance Companies. 

By R. J. Saulnier. 202 pages. 36 tables. 11 charts, $2.50. 

Studies in Income and Wealth, Volume Thirteen. By Conference on 
Research in Income and Wealth, 544 pages. $6.00. 

Taxable and Business Income. By Dan Throop Smith and J. Keith 
Butters. 368 pages. 28 tables. 6 charts. $4.00. 

The Transportation Industries, 1889-1946: A Study of Output, Employ- 
ment, and Productivity. By Harold Barger. 320 pages. 38 tables, 25 
charts. $4.00. 


Occasional Papers 
Statistical Indicators of Cyclical Revivals and Recessions. 
By Goeffrey H. Moore. 104 pages. 17 tables. 10 charts. $1.50. 
Cyclical Diversities in the Fortunes of Industrial Corporations. 
By Thor Hultgren. 40 pages. 7 tables. 12 charts, $0.50. 
Employment and Compensation in Education, By George J. Stigler. 
88 pages. 40 tables. 8 charts. $1.00. 
Behavior of Wage Rates during Business Cycles. By Daniel Creamer. 
72 pages. 9 tables. 5 charts. $1.00. 
Shares of Upper Income Groups in Income and Savings. 
By Simon Kuznets. 72 pages. 21 tables. 1 chart. $1.00. 


National Bureau of Economic Research, Inc. 
1819 Broadway New York 23, N.Y. 


Information on subscription rates and a complete list of publications will be 
mailed on request. 


Please mention the Journal of the AMERICAN STATISTICAL ASSOCIATION in writing advertisers 


BIOMETRIKA 
Vol. 37 Pts. 3 & 4 


CONTENTS 


A simple stochastic epidemic. By N. T. J. BAILEY. 

On the Fisher-Behrens test. By G. A. BARNARD. 

The incomplete Beta Function as a contour integral and a quickly converging series 
for its inverse. By M. E. WISE. 

On the levels of significance of the incomplete Beta Function and the F-distribution. 
By L. A. AROIAN. 


On the generalized second limit-theorem in the calculus of probabilities. By D. G. 
KENDALL and K. 8. RAO. 


A note on the cumulants of Kendall’s S-distribution. By H. SILVERSTONE, 
The comparison of percentrges in matched samples. By W. G. COCHRAN, 


The distribution of the variance-ratio in random samples of any size drawn from 
non-normal universes. By A, K. GAYEN, 


The exact partition of x? and its application to the problem of the pooling of small 
expectations. By H. O. LANCASTER. 

Use of range in analysis of variance. By H. O. HARTLEY. 

On the comparison of estimators. By N. L. JOHNSON. 

A rapid method for ascertaining serial lag correlation. By G. D. GIBSON. 


Tables of the x?-inte, and of the cumulative Poisson distribution. By H. O, HART- 
LEY and E. 8. PEARSON. 


The maximum F-ratio as a short-cut test for heterogeneity of variance. By H. O. 
HARTLEY. 

On the sequential t-test. By S. RUSHTON. 

Properties of some tests in sequential analysis. By A. G. BAKER. 

The unbiassed estimation of heterogeneous error variance. By A. S. OC. EHRENBERG. 


Sampling theory of the negative binomial and logarithmic series distributions. 
By F. J. Anscombe. 


On questions raised by the combinations of tests based on discontinuous distribu- 
tions. By E. 8. PEARSON. 

Significance of difference between the means of two non-normal samples. By A. K. 
GAYEN. 

Testing for serial correlation in least squares regression—I. By J. DURBIN and 
G. 8S. WATSON, 


Distribution of ‘Student’-Fisher’s t in samples from compound normal functions. 
By H. HYRENIUS. 

MISCELLANIA 
The comparison of pairs of treatments in split-plot experiments. By J. TAYLOR. 
On the best unbiassed quadratic estimate of the variance. By H. NAGLER. 
The cumulants of the first n natural numbers. By A. STUART. 
Note on the x? smooth test. By D. A. 8S. FRASER, 
An alternative form of x?. By F. N. DAVID. 


On a theorem concerning the secondary subscripts of deviations in multi- 
variate correlation using Yule’s notation. By K. N. CHANDLER. 


REVIEWS 
“Statistics—Vol. I.” by N. L. Johnson and H. Tetley. By R. E. BEARD. 
“Probability and the Weighing of Evidence” by I. J. Good. By F. N. DAVID. 


Please mention the Journal of the AMERICAN STATISTICAL ASSOCIATION in writing advertisers 


> 
| 
— 
é 
*4 
‘ 
4 


A Major Name in 
TABULATING EFFICIENCY 


JOHN FELIX ASSOCIATES, tne. 


Staffed by Experts 
in Tabulating Techniques 


JOHN FELIX ASSOCIATES, Inc. 


Ready to Satisfy Your 
Marketing Survey Needs .. . 


“JOHN FELIX ASSOCIATES, Ine. 


Statisty ‘al and Tabulating Service 


Nassau Street, New York 7, N. 


Please mention the Journal of the AMERICAN STATISTICAL ASSOCIATION in writing advertisere 


The Annals of Mathematical Statistics 


THE OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


CONTENTS 


Ray BAHADAR and HERBERT RossBins 


Distributions Related to Comparison of Two Means and Two Regression 
Urtram CHAND 


The Extremal Quotient E. J. and R. D. KEeney 


On a Preliminary Test for Pooling Mean Squares in the Analysis of 


Estimating the Mean and Variance of Normal Populations from Singly 
Truncated and Doubly Truncated Samples ........... A. C. Cowen, Jr. 


The Asymptotic Properties of Estimates of the Parameters of a Single 
Equatien in a Complete System of Stochastic Equations ........ 
T. W. ANDERSON and HerMAN Rvusin 


Some Nonparametric Tests of Whether the Largest Observations of a 
Set are Too Large or Too Small .............ceeeeeee Joun E. Watsx 


On a Measure of Dependence between Two Random Variables ....... 
Nits BLomovist 


Some Two Sample Tests ...... G. CHAPMAN 


Transformations Related to the Angular and the Square Root ........ 
Murray F, Freeman and Joun W. TuKEY 


Remarks on the Article “On a Class of Distributions that Approach the 
Normal Distribution Function” by George B. Dantzig. .T. N. E. GreviL_e 


Independence of Quadratic Forms in Normally Correlated Variables... 
Yuxryost KAwADA 


Volume 21, No. 4, December, 1950 


Inquiries and subscription orders should be sent to C. H. Fischer, Secretary, 
The Institute of Mathematical Statistics, University of Michigan, Ann Arbor, 
Mi 


Please mention the Journal of the AMERIOAN STATISTIOAL ASSOCIATION in writing advertisers 


+ 
2 


Volume IV of 
STUDIES IN SOCIAL PSYCHOLOGY IN WORLD WAR Il 


Measurement 


and Prediction 


By SAMUEL STOUFFER et al. 
Editorially sponsored by the Social Science Research Council 


e MEASUREMENT AND PRE- 
DICTION, the final volume in this 
series, is a highly scientific and de- 
tailed report on the underlying 
methods of research used in the 
three preceding volumes, 


The first part is a theoretical and 
empirical analysis of the problems 
of measurement. The second studies 
two major Army Research Branch 
efforts at prediction—in psychiatry 
and in forecasting postwar plans. 


The methodology is basic, and 
applicable to scientific research gen- 
erally. A challenge to conventional 
techniques in the field, MEASURE- 
MENT AND PREDICTION or- 
ganizes compactly a vast amount of 
fresh thinking on the problems of 
social science research. 

768 pages, $10.00 


The three preceding volumes 


“Without parallel in the history of the 
social sciences.”—Paul F. Lazarsfeld. 


“Today, when our preparations for de- 
fense have such critical importance, 
these volumes are particularly timely.” — 
General George C. Marshall. 


1. THE AMERICAN SOLDIER: Adjust- 


ment During Army Life. $7.50 
Il. THE AMERICAN SOLDIER: Combat 
and its Aftermath. $7.50 


(Volumes | and II together, $13.50) 


Ill, EXPERIMENTS ON MASS COM- 
MUNICATION. "Probably the most 
thoroughgoing analysis of propaganda 
ever written."—Newsweek $5.00 


At your bookstore, PRINCETON UNIVERSITY PRESS 


Please mention the Journal of the AMERICAN STATISTICAL ASSOCIATION in writing advertisers 


| 


THE ASSOCIATION ACKNOWLEDGES WITH DEEP 
APPRECIATION ITS INDEBTEDNESS TO THE 
INSTITUTIONAL MEMBERS WHO SUPPORT THE 
ASSOCIATION'S PROGRAM 


INSTITUTIONAL MEMBERS 
OF THE 


AMERICAN STATISTICAL ASSOCIATION. 


American Telephone & Telegraph Company, New York City 
Chrysler Corporation, Detroit, Michigan 

Deere & Company, Moline, Illinois 

Dun & Bradstreet, Inc., New York City 

Ford Motor Company, Dearborn, Michigan 

General Motors Corporation, Detroit, Michigan 

The McBee Company, Athens, Ohio 

Metropolitan Life Insurance Company, New York City 
Standard Oil Company (Indiana), Chicago, Illinois 
Sun Oil Company, Philadelphia, Pennsylvania 
Western Electric Company, New York City 


q 

< 

a 

7 
+ 

| 

4 


JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


CONTENTS OF VOLUME 45 
THe 109TH ANNUAL MEETING 


MINUTES OF THE ANNUAL Business MEETING. . . . . 270 

Book Reviews ...... 140,270, 454, 570 
Bareven Norices ..... . «© « « « 48 
REcEIVED . . . . . . 148,309, 476, 586 


INDEX TO VoLUME 45, 1950 


ARTICLES, BY AUTHOR. . . ....... 


Book Reviews, By AUTHOR. . ..... 692 


REPORTS AND OFFICIAL NOTICES 


REPORT OF THE COMMITTEE ON FELLOWS . . . . . 270 
Report OF THE BoarD oF DirREcTorSs . . . . . . . 272 
Report or Actions TAKEN BY THE CoUNCIL . . . . . 273 
Report OF COMMITTEE ACTIVITIES . . . . . . . . 227 
REPORTS OF THE SECRETARY AND TREASURER . . . . . 278 
REPORT OF THE COMMITTEE ON ELECTIONS. . . . . . 278 
REPORT OF THE RESOLUTIONS COMMITTEE . . . . . . 279 
REPORT OF THE COMMITTEE ON COMMITTEES . . . . . 279 
REPORT OF THE SUBCOMMITTEE ON AssocIATION SEcTIONS . 281 


P 
Ly 


INDEX TO VOLUME}45: 1950 
ARTICLES 


Aroran, Leo A., and Leveng, Howarp, The Effectiveness of Quality Control 
Charts .. 

Bancrort, T. A., Probability Values for the Common Tests of ‘Hypotheses 
portunities for Women . . ‘ 

BERKSON, Josepx, Are There Two 

Z. W., and Monroe G., Bias Due te Non-Availability 
in Sampling 

Donatp J., A Technique for Making Extensive + Population Esti- 

Bross, Irnw1n, Two-Choice Selection 

Cuuna, J. H., Sequential Sampling from Finite Lets When ‘the Prepertion 
Defective Ts Small . 

Couen, A. C., Jr., Estimating Parameters of Pearson Type L II Populations 
from Truncated Samples. 

CoRNFIELD, JEROME, and MANTEL, Naruan, Sone New ‘Acpede the Ap 
plication of Maximum Likelihood to the Calculation of the Dosage Re- 
sponse Curve . ‘ 

CorNFIELD, JEROME, and Manrat, Correction to “Some New ae 
pects of the Application of Mazimum Likelihood to the Calculation of the 
Dosage Response Curve” . : 

DaniEL CUTHBERT, and HEEREMA, Nicuo.as, Design of Bupertmente fer 
Most Precise Slope Estimation or Linear Extrapolation . os 

Davis, Josep Population and Resources . . 

Dorn, Harotp F., Pitfalls in Population Forecasts and Projections 

Dusuin, Louis I., Alfred James Lotka, 1880-1949 

FREUND, Joun E., The Degree of Stereotypy . 

GILLMAN, Luonazp, Operations Analysis and the Theory of Games: 
vertising Example 

Goopman, Roz, and Kissa, Lusem, Controlled Technique in 
Probability Sampling . 

GREENBERG, B. G., Joun L end unre, G., A Technique 
for Analyzing Some Factors Affecting the Incidence of Syphilis 

GREENWOOD, JosEPH A., and SANDOMIRE, Marion M., Sample Size Re- 
quired for arene the Standard Deviation as a Per Cent of its True 

HAMMER, Preston Interference with a Controlled Pressie. 

Hauser, Parr M., end PEaRL, Rosert B., Who Are the Unemployed? . 

HEEREMA, Nicuoxas, and DanizL, Curmszar, Design of Experiments for 
Most Precise Slope Estimation or Linear Extrapolation . 

Jasny, Naum, International Organizations and Soviet Statistics . . . 

Jounson, Patmer O., The of Qualitative Data in 
Analysis . . 

Karmet, P. H., A Note on nm P. K. Whelpton’ 8 Calculation 4 Parity Adjusted 
Reproduction Rates . 


590 


119 


520 
211 
149 
557 
411 
181 
569 
311 
138 
265 
541 
| 
373 
257 
249 
479 
546 
48 
65 


INDEX TO VOLUME 45 


Kisu, Lesiiz, and GoopMan, Rog, Controlled Selection—A in 
Probability Sampling. ‘ ‘ 

Kuznets, Simon, Conditions of Statistical 

Leona, Y. S., Index of the Physical Volume Production of Minewels, 1880- 

LEVENE, Howano, and Anotan, ‘Lao The Effectiveness of Quality Control 
Charis . . 

Mapow, Lituian H., On the Use of the County as the Primary Sampling | Unit 
Sor State Estimates 

MA.uzBERG, BENJAMIN, Horatio Milo Pollock, 1868-1950 , 

ManTEL, NaTHAN, and CoRNFIELD, JEROME, Some New Aspects of. the Ap 
plication of Maximum Likelihood to the Calculation of the Dosage Re- 
sponse Curve . 

MaAnrTEL, NATHAN, and Connriz.p, Junous, Correction to “Some New Ae 
pects of the Application of Maximum Likelihood to the Calculation of the 
Dosage Response Curve” . 

Marks, Et and Ww. Response Breere in 


Research 

Mau.opin, W. and Mazxs, Eu S., in ‘Cone Re- 
search. . 

METZNER, A, Application of Sealing to Questionnaire Con- 
struction . 


Morrison, J., oud Sunauan, Jace, Simplified fer 
Fitting a Gompertz Curve and a Modified Exponential Curve. A 

NoresteIn, Frank W., The Population of the World in the Year 2000. . 

Ropert B., and Hauser, M., Who Are the Unemployed? . 

Po.itz, ALFRED, and Simmons, WILLARD, Note on “An Attempt to Get Not- 
at-Homes into the Sample Without Callbacks” ‘ 

SanpomirE, Marion M., and Greenwoop, Joseps A., Sample Re- 
quired for Betimating the Standard Deviation as a Per Cent of Its True 

SHapiro, Sam, Eetimating Birth Registration Completeness 

Sueps, Ceciz G., GREENBERG, B. G., and Wricut, Joun J., A Technique 
for Anelesing Some Factors Affecting the Incidence of Byphilie. 

SHERMAN, Jack, and Morrison, WINIFRED J., Simplified Procedures for 
Fitting a Gompertz Curve and a Modified Exponential Curve 

Simmons, WILLARD, and Pouitz, ALFRED, Note on “An Attempt to Get Not- 
at-Homes into the Sample Without Callbacks”. . . 

Srrken, Monroe G., and Birnsaum, Z. W., Bias Due to Non-Availaility 
in Sampling Surcoye ‘ 

Situ, H. Estimating Precision of ‘Mesewing . 

Sprow.s, R. Cray, Statistical Decisions by the Method of Minimum Risks: 
An Application . 

Wansa, Joun E., Correction to “On the Best Choice of Sample Sines For a a 
t-Test When the Ratio of Variances is Known” . . 

Joun E., Large Tests and Confidence Intereals fer Mortality 
Rates . 

Wet, H., A Note ¢ on the Derisation of Income Estimates by Seuss of 
Income of Poresns Making Less than $500 per Annum, 1944-1948 


591 


261 


373 


439 


|| 
350 
15 
20 
1 520 
0 30 
4 452 
8 
181 
9 
0 
569 
7 
424 
1 
424 
1 112 
87 
) 335 
479 
136 
257 n 
P| 
| 
) 
87 
136 
98 
447 
238 
111 
225 


592 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


Wuetpton, P. K., Comments on Mr. Karmel’s Note. . 

Tuxey, Joun W., "Bene Sampling Simplified . . 

C. Jr., On the Choice of the N ond Width of Clases 
for the Chi-Square Test of Goodness of Fit. . . 

Warieat, Joun J., GREENBERG, B. G., and SHEPs, Czcn G., A Technique 
for Andlysing Some Factors Affecting the Incidence of Syphilis 

ZoBEL, StiamunD P., On the Measurement of the Productivity of Labor . 


BOOK REVIEWS 


Acceptance Sampling: A Symposium Given at the 105th Annual Meeting, 
American Statistical Association . . . . . H. A. Freeman 
ALBRITTON, Errett C. Experimental Design and Judgement of Evidence . 
. Marcaret MERRELL 
Antsy, ‘and Bucs, K. ntroduction to the Theory of Proba- 
bility and Statistics . . . .JoHN FreuND 
BERKELEY, Epmunp C., Giant Breins or Machines That Think . 
. N. METROPOLIS 
Buarr, ELMER Chane, Business Cycles ond Forecasting, Third Edition 
. .Josepn A. ScHUMPETER 
K. A., Buperimentation (Third American Edition) . 
. . W. ScuustTexk 
Buce, K. and Amar, Nims, to the Theory of Proba- 
bility and Statistics . . . .JoHN E. FreunpD 
Butrers, J. Kerru, Assisted ow Powstt, Effects of Taxation, In- 
ventory Accounting and Policies . . . . Haroip BENJAMIN 
CHAPANIS, ALPHONSE, GARNER, WENDELL R., and Morgan, CuiFFrorp T., 
Applied Experimental Psychology . . . . ReuBEN L. Ruvans 
Cocuran, WiLi1aM G., and Cox, GERTRUDE M., Buperimental Designs . 
. PALMER O. JOHNSON 
Cox, M., and Cocunan, Experimental Designs . 
. PALMER O. JOHNSON 
Cuanuzs L., and Owszn, Wrenn, National Transportation 
Policy .. . . Paut H. ANDERSON 
Dusuin, Louis I., Lorxa, J, ond Morti_er, Length 
of Life, A Study of the Life Table, Revised Edition 
Hanoup F. Dorn 
Esrzy, Janus Business Cycles: Their Neture, Cause, and Control 
Atvin MayYNE 
. . «MELVILLE J. 
Gannzn, R., Cuaranm, and Morgan, CuiFrForp T., 


Applied Experimental Psychology . . . . .REUBEN L. Ruvans 
Greic, Gertrup Berta, Seasonal Fluctuations in Employment in the 
Women’s Clothing Industry in New York. . . . Lazare TEPER 


Haines, Georce H., Workbook for Business and Economic Statistics . 
Donatp W. PapEN 
Hatz, Mancusnirn F., Public Health Statistics, 2nd ‘Edition, Revised . 


125 
501 


77 


; 
145 
| : 
: 
: 
308 
289 
@ 
296 
575 


Or 


Cw 


INDEX TO VOLUME 45 


Historical Statistics of the United States, 1789-1945 . W.8. Woytinsxy 
Hyman, Hersert, Freperick, McCarrsy, Pair J., 
Margs, Ext Truman, Davin B. (and Collaborators), The Pre- 
Election Polls of 1948 . . . . Louis H. Bean 
INTERNATIONAL Economics Division, Orricz or Business Economics, 
U. 8. DeparTMENT oF CommeERcE, The Balance of International Pay- 


ments of the United States, 1946-1948. . . Roun F. BENNETT 
INTERNATIONAL Monetary Funp, Balance of Payments Yearbook, 1938, 
1946,1947 . . Myrrie Brickman 


Karka, F. Statistics without Numbers; « a Visual Beplenation 
C. Janne 


Janxinson, Bruce Bureas of the Census Manual of Tabular Presenta- 


tion . Ray Ovip 
KNEALE, Probability ont . . Joun H. Smita 
Lewis, Don, Quantitative Methods in Psychology. . . Puxttip DEsIND 


ALFRED J., Dusxin, Louis I., and Mortimer, Length 
of Life, A Study of the Life Table, Revised Edition Haroup F. Dorn 
Mair, GeorGE F., Epitor, Studies in Population JEAN CLAIRE BowMAN 
Mann, H. B., Analysis and Design of Experiments . ANDREW G. CLARK 
Margs, Ext Frepericxk, Hyman, Hersert, McCarray, 
Pair J., "Tauman, Davin B. (and Collaborators), The Pre-Election 


Polls of 1948 . — . . Louis H. Bean 
Maroney, MILDRED, Muntam, Lewis, and “SCHLOTTERBECK, Kartu T., 
The Cost and Financing of Social Security . . H.W. Srammzaus 
MatuHeERr, K., Biometrical Genetics . . . . . . James F, Crow 


McCartay, Pair J., FRepericx, Hruan, HERBERT, 
Margs, Eu &., Tavuan, Davin B. (and Collaborators), The Pre- 
Election Polls of 1948 . . . . Louris H. Bean 

MeniaMm, Lewis, SCHLOTTERBECK, T., Manonzy, MILDRED, The 
Cost and Financing of Social Security. . . 4H. W. Stremsavs 

Moore, Georrrey H., Statistical Indicators of Cyclical Revivals and Reces- 
sions . . Watrer E. Hoap.ey, Jr. 

Moraan, Cuarsoap T., Cuaranm, Atpnonss, and GARNER, WENDELL R., 
Applied Experimental Psychology. . . .Revusen L. REVENS 

MorGENSTERN, Oscar, On the Accuracy of Economic Observations . 

. Kusners 

Moersiuze, Hrman, Herbert, McCarry, J., 
Marks, Truman, Davip B. (and Collaborators), The 


Election Polls of 1948 . . . . Louis H. Bean 
NaTIONAL BurEavu or Economic Reseance (Publisher), Studies in Income 
and Wealth, Volume Twelve. . . Joxun B. Cannine 
NEyYMAN, JERZY, Editor, Proceedings of the Berkeley Symposium on Mathe- 
matical Statistics and Probability . . . . . ©. C. Crate 
NiLanpD, PowE.L, and Burrers, J. Keira, Effects of Taxation, Inventory 
Accounting and Policies. . BENJAMIN 


Owen, WILFRED, and Dgarina, National Transportation Pol- 
tcy . .Paut H. ANDERSON 

Machine Computation of Elementary Statistics. . 
. ALBERT H. Bowker 


593 
301 


574 


287 
456 
292 
142 
144 
289 
471 
| 570 
461 | 
se 
571 
46) 
584 
579 
296 
576 
461 
581 
303 
470 
308 


594 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1950 


REICHENBACH, Hans, The Theory of Probability: An Inquiry into the Logical 
and Mathematical Foundations of the Calculus of Probability. Second 
Edition . . . . Joun H. Smita 

Rounp TABLE ON INTERNATIONAL Sranierica, Probleme in the Collection 
and Comparability of International Statistics. . . . P.J. Lorrus 

RuTHERFORD, JoHN G., Quality Control in Industry: Methods and Systems 

. «  W.R. Passt, Jr. 

Kant T., Maniax, Lewis, and Maroney, MILprep, The 
Cost and Financing of Social Security . . . H. W. Srermngavus 

SpreceELMAN, Mortimer, Dusuin, Louis I., and Lorca, Aurrep J., 
Length of Life, A — of the Life Table, Revised Edition . 

. Haroun F. Dorn 

Orrics OF THE Unrrzp ‘Nations in COLLABORATION WITH THE 
DEPARTMENT OF Socrau Arratrs, Demographic Yearbook, 1948 . . 

GeorceE J., and Compensation in Education. 

. . F. G. Cornet, 

Davin B., FREDERICK, Hyman, Hersert, Mc- 
CarTHY, Pusu J., and Marks, Ext 8. (and Collaborators), The Pre- 
Election Polis of 1948 — . . . Louis H. Bean 

UNIVERSITIES— NATIONAL Burzav on Economic REsEARCH, 
Problems in the Study of Economic Growth . . . M.M. Kniaat 


a 
| 
3 
: 
= 465 
288 
584 
ag 
4 
289 
582 
; 
¥ 
ix 
461 i 
sie 
3 
Pe 
3 
3 
4 
ES 
¥ 
Ae 
: 
q 


* 


The Leader in Its Field—~ 


APPLIED GENERAL STATISTICS 


By Frederick E. Croxton, Columbia University, end some 
Cowden, University of North Caroling 

A basic study thet gives a more comprehensive treatment ef statistical 
metheds than any other text in the field. Among essential topics covered 


are: statistical. relic'suity and significance, fitting of nermmel curves, bi- 
nomials and skewed curves, analysis of variance, changing seascnale and 


© lends 16th solution of specie probleme ol 


Jing in different fields. 


the z distribution, F fer the analysis 
of variance, ofc. 


© Teaching and study -wide—257 diagrams, 190 fables, 


mathematical 
944 pages 
Companion Velume— 


WORKBOOK 
in APPLIED GENERAL sTansnies 


5 bility in emphasizing, given topics or expanding the scope of @ course. 


_ Several are totally new; while ofl use date that 


either new Gr brought date. All the nece.sary worksheets, graphic 
and tabular ferms, and formulas cre supplied. 


5 461 pp. plus 62 pp. forme 


PRENTICE-HALL 


“70 Ave., New York 


GEORGE, BANTA PUBLISHING COMPAMY, wiscomans iy 


grow.h curves. A compiete discussion of correlation includes linear, nen- 
tinear, ond multiple correlation. Among vaiveble features of the beok: 
@ Presentation—eimple, forthright language, holding technical terminel- 
| 
a 
|. ~ 
4 


2 
f 
4 
A ; Ay 


